array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
I've got a raid5 array with 5 disks where 2 failed. The failures are
occasional and only on a few sectors so I tried to assemble it with 4
disks anyway:

# mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4

However mdadm complains that one of the disks has an out-of-date
superblock and kicks it out, and then it cannot run the array with
only 3 disks.

Shouldn't it adjust the superblock and assemble-run it anyway? That's
what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Sunday January 20, [EMAIL PROTECTED] wrote:
 I've got a raid5 array with 5 disks where 2 failed. The failures are
 occasional and only on a few sectors so I tried to assemble it with 4
 disks anyway:
 
 # mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4
 
 However mdadm complains that one of the disks has an out-of-date
 superblock and kicks it out, and then it cannot run the array with
 only 3 disks.
 
 Shouldn't it adjust the superblock and assemble-run it anyway? That's
 what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.

Please provide actual commands and actual output.
Also add --verbose to the assemble command
Also provide --examine for all devices.
Also provide any kernel log messages.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:13:
 On Sunday January 20, [EMAIL PROTECTED] wrote:
  I've got a raid5 array with 5 disks where 2 failed. The failures are
  occasional and only on a few sectors so I tried to assemble it with 4
  disks anyway:
  
  # mdadm -A -f -R /dev/mdnumber /dev/disk1 /dev/disk2 /dev/disk3 /dev/disk4
  
  However mdadm complains that one of the disks has an out-of-date
  superblock and kicks it out, and then it cannot run the array with
  only 3 disks.
  
  Shouldn't it adjust the superblock and assemble-run it anyway? That's
  what -f is for, no? This is with kernel 2.6.22.16 and mdadm 2.6.4.
 
 Please provide actual commands and actual output.
 Also add --verbose to the assemble command
 Also provide --examine for all devices.
 Also provide any kernel log messages.

The command is

mdadm -A --verbose -f -R /dev/md3 /dev/sda4 /dev/sdc4 /dev/sde4 /dev/sdd4

The failed areas are sdb4 (which I didn't include above) and sdd4. I
did a dd if=/dev/sdb4 of=/dev/hda4 bs=512 conv=noerror and it
complained about roughly 10 bad sectors. I did dd if=/dev/sdd4
of=/dev/hdc4 bs=512 conv=noerror and there were no errors, that's why
I used sdd4 above. I tried to substitute hdc4 for sdd4, and hda4 for
sdb4, to no avail.

I don't have kernel logs because the failed area has /home and /var.
The double fault occurred during the holidays, so I don't know which
happened first. Below are the output of the command above and of
--examine.

mdadm: looking for devices for /dev/md3
mdadm: /dev/sda4 is identified as a member of /dev/md3, slot 0.
mdadm: /dev/sdc4 is identified as a member of /dev/md3, slot 2.
mdadm: /dev/sde4 is identified as a member of /dev/md3, slot 4.
mdadm: /dev/sdd4 is identified as a member of /dev/md3, slot 5.
mdadm: no uptodate device for slot 1 of /dev/md3
mdadm: added /dev/sdc4 to /dev/md3 as 2
mdadm: no uptodate device for slot 3 of /dev/md3
mdadm: added /dev/sde4 to /dev/md3 as 4
mdadm: added /dev/sdd4 to /dev/md3 as 5
mdadm: added /dev/sda4 to /dev/md3 as 0
mdadm: failed to RUN_ARRAY /dev/md3: Input/output error
mdadm: Not enough devices to start the array.

On screen it shows kicking out of date... for sdd4.

/dev/sda4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

Update Time : Wed Jan 16 16:00:53 2008
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 16119868 - correct
 Events : 0.14967284

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 0   840  active sync   /dev/sda4

   0 0   840  active sync   /dev/sda4
   1 1   001  active sync  -  note the 
difference compared to sdc4
   2 2   8   362  active sync   /dev/sdc4
   3 3   003  faulty removed
   4 4   8   684  active sync   /dev/sde4

/dev/sdc4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 4
Preferred Minor : 3

Update Time : Wed Jan 16 16:00:53 2008
  State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 2
  Spare Devices : 0
   Checksum : 1611988f - correct
 Events : 0.14967284

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice State
this 2   8   362  active sync   /dev/sdc4

   0 0   840  active sync   /dev/sda4
   1 1   001  faulty removed
   2 2   8   362  active sync   /dev/sdc4
   3 3   003  faulty removed
   4 4   8   684  active sync   /dev/sde4

/dev/sdd4:
  Magic : a92b4efc
Version : 00.90.00
   UUID : 2f2f8327:375b4306:94521055:e3dc373b
  Creation Time : Tue May 11 16:03:35 2004
 Raid Level : raid5
  Used Dev Size : 70454400 (67.19 GiB 72.15 GB)
 Array Size : 281817600 (268.76 GiB 288.58 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 3

Update Time : Fri Jan 11 18:45:17 2008
  State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 2
  Spare Devices : 1
   Checksum : 160b27ce - correct
 Events : 0.14967266

 Layout : left-symmetric
 Chunk Size : 128K

  Number   Major   Minor   RaidDevice 

Re: array doesn't run even with --force

2008-01-20 Thread Neil Brown
On Monday January 21, [EMAIL PROTECTED] wrote:
 
 The command is
 
 mdadm -A --verbose -f -R /dev/md3 /dev/sda4 /dev/sdc4 /dev/sde4 /dev/sdd4
 
 The failed areas are sdb4 (which I didn't include above) and sdd4. I
 did a dd if=/dev/sdb4 of=/dev/hda4 bs=512 conv=noerror and it
 complained about roughly 10 bad sectors. I did dd if=/dev/sdd4
 of=/dev/hdc4 bs=512 conv=noerror and there were no errors, that's why
 I used sdd4 above. I tried to substitute hdc4 for sdd4, and hda4 for
 sdb4, to no avail.
 
 I don't have kernel logs because the failed area has /home and /var.
 The double fault occurred during the holidays, so I don't know which
 happened first. Below are the output of the command above and of
 --examine.
 
 mdadm: looking for devices for /dev/md3
 mdadm: /dev/sda4 is identified as a member of /dev/md3, slot 0.
 mdadm: /dev/sdc4 is identified as a member of /dev/md3, slot 2.
 mdadm: /dev/sde4 is identified as a member of /dev/md3, slot 4.
 mdadm: /dev/sdd4 is identified as a member of /dev/md3, slot 5.
 mdadm: no uptodate device for slot 1 of /dev/md3
 mdadm: added /dev/sdc4 to /dev/md3 as 2
 mdadm: no uptodate device for slot 3 of /dev/md3
 mdadm: added /dev/sde4 to /dev/md3 as 4
 mdadm: added /dev/sdd4 to /dev/md3 as 5
 mdadm: added /dev/sda4 to /dev/md3 as 0
 mdadm: failed to RUN_ARRAY /dev/md3: Input/output error
 mdadm: Not enough devices to start the array.

So no device claim to be member '1' or '3' of the array, and as you
cannot start an array with 2 devices missing, there is nothing that
mdadm can do.  It has no way of knowing what should go in as '1' or
'3'.

As you note, sda4 says that it thinks slot 1 is still active/sync, but
it doesn't seem to know which device should go there either.
However that does indicate that slot 3 failed first and slot 1 failed
later.  So if we have candidates for both, slot 1 is probably more
uptodate.

You need to tell mdadm what goes where by creating the array.
e.g. if you think that sdb4 is adequately reliable and that it was in
slot 1, then

 mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 /dev/sdb4 /dev/sdc4 missing 
/dev/sde4

alternately if you think it best to use sdd, and it was in slot 3,
then

 mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 missing /dev/sdc4 /dev/sdd4 
/dev/sde4

would be the command to use.

Note that this command will not touch any data.  It will just
overwrite the superblock and assemble the array.
You can then 'fsck' or whatever to confirm that the data looks good.

good luck.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: array doesn't run even with --force

2008-01-20 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 14:09:
 As you note, sda4 says that it thinks slot 1 is still active/sync, but
 it doesn't seem to know which device should go there either.
 However that does indicate that slot 3 failed first and slot 1 failed
 later.  So if we have candidates for both, slot 1 is probably more
 uptodate.

I was going home (it's 1h20 past midnight) when I remembered and came
back to write that assembling with
/dev/sda4 /dev/sdb4 /dev/sdc4 missing /dev/sde4

works, which confirms what you say. Adding sdd4 back it
starts resyncing, however since sdb4 has errors, a double fault
happens again and the array fails.

 You need to tell mdadm what goes where by creating the array.
 e.g. if you think that sdb4 is adequately reliable and that it was in
 slot 1, then
 
  mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 /dev/sdb4 /dev/sdc4 missing 
  /dev/sde4
 
 alternately if you think it best to use sdd, and it was in slot 3,
 then
 
  mdadm -C /dev/md3 -l5 -n5 -c 128 /dev/sda4 missing /dev/sdc4 /dev/sdd4 
  /dev/sde4
 
 would be the command to use.
 
 Note that this command will not touch any data.  It will just
 overwrite the superblock and assemble the array.
 You can then 'fsck' or whatever to confirm that the data looks good.

I have two possibilities: use sdd4 in slot 3 or the dump of sdb4 in
another disk in slot 1. This copy is more recent but has errors. Is it
possible to know which would be less bad before I fsck?
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html