Re: On the subject of RAID-6 corruption recovery

2008-01-07 Thread Thiemo Nagel
What you call pathologic cases when it comes to real-world data are 
very common.  It is not at all unusual to find sectors filled with only 
a constant (usually zero, but not always), in which case your **512 
becomes **1.


Of course it would be easy to check how many of the 512 Bytes are really 
different on a case-by-case basis and correct the exponent accordingly, 
and only perform the recovery when the corrected probability of 
introducing an error is sufficiently low.


Kind regards,

Thiemo Nagel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: On the subject of RAID-6 corruption recovery

2008-01-07 Thread Mattias Wadenstein

On Mon, 7 Jan 2008, Thiemo Nagel wrote:

What you call pathologic cases when it comes to real-world data are very 
common.  It is not at all unusual to find sectors filled with only a 
constant (usually zero, but not always), in which case your **512 becomes 
**1.


Of course it would be easy to check how many of the 512 Bytes are really 
different on a case-by-case basis and correct the exponent accordingly, and 
only perform the recovery when the corrected probability of introducing an 
error is sufficiently low.


What is the alternative to recovery, really? Just erroring out and letting 
the admin deal with it, or blindly assume that the parity is wrong?


/Mattias Wadenstein
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: On the subject of RAID-6 corruption recovery

2008-01-07 Thread Thiemo Nagel

Mattias Wadenstein wrote:

On Mon, 7 Jan 2008, Thiemo Nagel wrote:

What you call pathologic cases when it comes to real-world data are 
very common.  It is not at all unusual to find sectors filled with 
only a constant (usually zero, but not always), in which case your 
**512 becomes **1.


Of course it would be easy to check how many of the 512 Bytes are 
really different on a case-by-case basis and correct the exponent 
accordingly, and only perform the recovery when the corrected 
probability of introducing an error is sufficiently low.


What is the alternative to recovery, really? Just erroring out and 
letting the admin deal with it, blindly assume that the parity is wrong?


Currently, 'repair' does blind recalculation of parity.  The only 
benefit of that is (correct me if I'm wrong) to ascertain repeated reads 
return identical data.


The last time I checked, there was not even a warning message.

Kind regards,

Thiemo Nagel
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 1, can't get the second disk added back in.

2008-01-07 Thread Jim

Neil Brown wrote:

On Saturday January 5, [EMAIL PROTECTED] wrote:
  

[EMAIL PROTECTED]:~# mdadm /dev/md0 --add /dev/hdb5
mdadm: Cannot open /dev/hdb5: Device or resource busy

All the solutions I've been able to google fail with the busy.  There is 
nothing that I can find that might be  using /dev/hdb5 except the raid 
device and it appears it's not either.



Very odd. But something must be using it.

What does
   ls -l /sys/block/hdb/hdb5/holders
show?
What about
   cat /proc/mounts
   cat /proc/swaps
   lsof /dev/hdb5
  
??

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  
I agree but for the life of me I can't figure out what, other than some 
raid daemon.


[EMAIL PROTECTED]:~# ls -l /sys/block/hdb/hdb5/holders
total 0
[EMAIL PROTECTED]:~# cat /proc/mounts
none /sys sysfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw 0 0
/dev/disk/by-uuid/4f67dae8-cdcb-460e-86cd-a5f0e4009422 / ext3 
rw,data=ordered 0 0
/dev/disk/by-uuid/4f67dae8-cdcb-460e-86cd-a5f0e4009422 /dev/.static/dev 
ext3 rw,data=ordered 0 0

tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /dev/shm tmpfs rw 0 0
devpts /dev/pts devpts rw 0 0
usbfs /dev/bus/usb/.usbfs usbfs rw 0 0
udev /proc/bus/usb tmpfs rw 0 0
usbfs /proc/bus/usb/.usbfs usbfs rw 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
/dev/sda5 /home ext3 rw,data=ordered 0 0
/dev/md0 /backupmirror ext3 rw,data=ordered 0 0
/dev/hda1 /vz ext3 rw,data=ordered 0 0
/vz/private/225 /vz/root/225 simfs rw 0 0
/vz/private/300 /vz/root/300 simfs rw 0 0
/proc /vz/root/225/proc proc rw 0 0
/sys /vz/root/225/sys sysfs rw 0 0
none /vz/root/225/dev/pts devpts rw 0 0
/proc /vz/root/300/proc proc rw 0 0
/sys /vz/root/300/sys sysfs rw 0 0
none /vz/root/300/dev/pts devpts rw 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
[EMAIL PROTECTED]:~# cat /proc/mounts | grep hdb
[EMAIL PROTECTED]:~# cat /proc/swaps
[EMAIL PROTECTED]:~# lsof /dev/hdb5

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 1, can't get the second disk added back in.

2008-01-07 Thread Jim

Neil Brown wrote:

On Saturday January 5, [EMAIL PROTECTED] wrote:
  

[EMAIL PROTECTED]:~# mdadm /dev/md0 --add /dev/hdb5
mdadm: Cannot open /dev/hdb5: Device or resource busy

All the solutions I've been able to google fail with the busy.  There is 
nothing that I can find that might be  using /dev/hdb5 except the raid 
device and it appears it's not either.



Very odd. But something must be using it.

What does
   ls -l /sys/block/hdb/hdb5/holders
show?
What about
   cat /proc/mounts
   cat /proc/swaps
   lsof /dev/hdb5
  
??

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  
Problem is not raid, or at least not obviously raid related.  The 
problem is that the whole disk, /dev/hdb is unavailable. 


[EMAIL PROTECTED]:~# for i in /dev/hdb?
 do
 mount $i /mnt
 done
mount: /dev/hdb1 already mounted or /mnt busy
mount: /dev/hdb2 already mounted or /mnt busy
mount: /dev/hdb3 already mounted or /mnt busy
mount: /dev/hdb4 already mounted or /mnt busy
mount: /dev/hdb5 already mounted or /mnt busy
mount: /dev/hdb6 already mounted or /mnt busy
[EMAIL PROTECTED]:~# mount /dev/hda1 /mnt

I can fdisk it but none of the partitons are available. Knoppix can 
access it normally so it's not a hardware issue.  No funny messages in 
the syslog. So I'll go off and stop harassing this list.  ;)


Thanks,
Jim.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Raid 1, new disk can't be added after replacing faulty disk

2008-01-07 Thread Radu Rendec
I'm experiencing trouble when trying to add a new disk to a raid 1 array
after having replaced a faulty disk.

A few details about my configuration:

# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb3[1]
  151388452 blocks super 1.0 [2/1] [_U]
  
md0 : active raid1 sdb2[1]
  3911816 blocks super 1.0 [2/1] [_U]
  
unused devices: none

# uname -a
Linux i.ines.ro 2.6.23.8-63.fc8 #1 SMP Wed Nov 21 18:51:08 EST 2007 i686
i686 i386 GNU/Linux

# mdadm --version
mdadm - v2.6.2 - 21st May 2007

So the story is this: disk sda failed and was physically replaced with a
new one. The new disk is identical and was partitioned exactly the same
way (as the old one and sdb). Getting sda2 (from the fresh empty disk)
to the array does not work. This is what happens:

# mdadm /dev/md0 -a /dev/sda2
mdadm: add new device failed for /dev/sda2 as 2: Invalid argument

Kernel messages follow:
md: sda2 does not have a valid v1.0 superblock, not importing!
md: md_import_device returned -22

It's obvious that sda2 does not have a superblock (at all) since it's a
fresh empty disk. But I expected mdadm to create the superblock and
start rebuilding the array immediately.

However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to
2.5.4 and it works like a charm.

If you reply, please add me to cc - I am not subscribed to the list.
Should I provide you further details or any kind of assistance for
testing, please let me know.

Thanks,

Radu Rendec

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is that normal a removed part in RAID0 still showed as active sync

2008-01-07 Thread Hxsrmeng

The /dev/md0 is set as RAID0
cat /proc/mdstat shows
md0 : active raid0 sda1[0] sdd1[3] sdc1[2] sdb1[1]
157307904 blocks 64k chunks

Then sdd is removed.

But  cat /proc/mdsta still shows the same information as above, while two
RAID5 devices show their sdd parts as (F)
md0 : active raid0 sda1[0] sdd1[3] sdc1[2] sdb1[1]
157307904 blocks 64k chunks

Is this normal?

Also, when using mdadm --detail
sdd1( part of RAID0) is showed as active sync, but sdd2(which is part of
RAID5) is showed as removed

Thank you.

-- 
View this message in context: 
http://www.nabble.com/Is-that-normal-a-removed-part-in-RAID0-still-showed-as-%22active-sync%22-tp14670113p14670113.html
Sent from the linux-raid mailing list archive at Nabble.com.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: On the subject of RAID-6 corruption recovery

2008-01-07 Thread H. Peter Anvin

Mattias Wadenstein wrote:

On Mon, 7 Jan 2008, Thiemo Nagel wrote:

What you call pathologic cases when it comes to real-world data are 
very common.  It is not at all unusual to find sectors filled with 
only a constant (usually zero, but not always), in which case your 
**512 becomes **1.


Of course it would be easy to check how many of the 512 Bytes are 
really different on a case-by-case basis and correct the exponent 
accordingly, and only perform the recovery when the corrected 
probability of introducing an error is sufficiently low.


What is the alternative to recovery, really? Just erroring out and 
letting the admin deal with it, or blindly assume that the parity is wrong?




Erroring out.  Only thing to do at that point.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 1, new disk can't be added after replacing faulty disk

2008-01-07 Thread Dan Williams
On Jan 7, 2008 6:44 AM, Radu Rendec [EMAIL PROTECTED] wrote:
 I'm experiencing trouble when trying to add a new disk to a raid 1 array
 after having replaced a faulty disk.

[..]
 # mdadm --version
 mdadm - v2.6.2 - 21st May 2007

[..]
 However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to
 2.5.4 and it works like a charm.

Looks like you are running into the issue described here:
http://marc.info/?l=linux-raidm=119892098129022w=2
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 1, can't get the second disk added back in.

2008-01-07 Thread Neil Brown
On Monday January 7, [EMAIL PROTECTED] wrote:
 Problem is not raid, or at least not obviously raid related.  The 
 problem is that the whole disk, /dev/hdb is unavailable. 

Maybe check /sys/block/hdb/holders ?  lsof /dev/hdb ?

good luck :-)

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why mdadm --monitor --program sometimes only gives 2 command-line arguments to the program?

2008-01-07 Thread Neil Brown
On Saturday January 5, [EMAIL PROTECTED] wrote:
 
 Hi all,
 
 I need to monitor my RAID and if it fails, I'd like to call my-script to
 deal with the failure.
 
 I did: 
 mdadm --monitor --program my-script --delay 60 /dev/md1
 
 And then, I simulate a failure with
 mdadm --manage --set-faulty /dev/md1 /dev/sda2
 mdadm /dev/md1 --remove /dev/sda2
 
 I hope the mdadm monitor function can pass all three command-line
 arguments to my-script, including the name of the event, the name of the
 md device and the name of a related device if relevant.
 
 But my-script doesn't get the third one, which should be /dev/sda2. Is
 this not relevant?
 
 If I really need to know it's /dev/sda2 that fails, what can I do?

What version of mdadm are you using?
I'm guessing 2.6, 2.6.1, or 2.6.2.
There was a bug introduced in 2.6 that was fixed in 2.6.3 that would
have this effect.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 1, new disk can't be added after replacing faulty disk

2008-01-07 Thread Neil Brown
On Monday January 7, [EMAIL PROTECTED] wrote:
 On Jan 7, 2008 6:44 AM, Radu Rendec [EMAIL PROTECTED] wrote:
  I'm experiencing trouble when trying to add a new disk to a raid 1 array
  after having replaced a faulty disk.
 
 [..]
  # mdadm --version
  mdadm - v2.6.2 - 21st May 2007
 
 [..]
  However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to
  2.5.4 and it works like a charm.
 
 Looks like you are running into the issue described here:
 http://marc.info/?l=linux-raidm=119892098129022w=2

I cannot easily reproduce this.  I suspect it is sensitive to the
exact size of the devices involved.

Please test this patch and see if it fixes the problem.
If not, please tell me the exact sizes of the partition being used
(e.g. cat /proc/partitions) and I will try harder to reproduce it.

Thanks,
NeilBrown



diff --git a/super1.c b/super1.c
index 2b096d3..9eec460 100644
--- a/super1.c
+++ b/super1.c
@@ -903,7 +903,7 @@ static int write_init_super1(struct supertype *st, void 
*sbv,
 * for a bitmap.
 */
array_size = __le64_to_cpu(sb-size);
-   /* work out how much space we left of a bitmap */
+   /* work out how much space we left for a bitmap */
bm_space = choose_bm_space(array_size);
 
switch(st-minor_version) {
@@ -913,6 +913,8 @@ static int write_init_super1(struct supertype *st, void 
*sbv,
sb_offset = ~(4*2-1);
sb-super_offset = __cpu_to_le64(sb_offset);
sb-data_offset = __cpu_to_le64(0);
+   if (sb_offset - bm_space  array_size)
+   bm_space = sb_offset - array_size;
sb-data_size = __cpu_to_le64(sb_offset - bm_space);
break;
case 1:
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html