Re: On the subject of RAID-6 corruption recovery
What you call pathologic cases when it comes to real-world data are very common. It is not at all unusual to find sectors filled with only a constant (usually zero, but not always), in which case your **512 becomes **1. Of course it would be easy to check how many of the 512 Bytes are really different on a case-by-case basis and correct the exponent accordingly, and only perform the recovery when the corrected probability of introducing an error is sufficiently low. Kind regards, Thiemo Nagel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: On the subject of RAID-6 corruption recovery
On Mon, 7 Jan 2008, Thiemo Nagel wrote: What you call pathologic cases when it comes to real-world data are very common. It is not at all unusual to find sectors filled with only a constant (usually zero, but not always), in which case your **512 becomes **1. Of course it would be easy to check how many of the 512 Bytes are really different on a case-by-case basis and correct the exponent accordingly, and only perform the recovery when the corrected probability of introducing an error is sufficiently low. What is the alternative to recovery, really? Just erroring out and letting the admin deal with it, or blindly assume that the parity is wrong? /Mattias Wadenstein - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: On the subject of RAID-6 corruption recovery
Mattias Wadenstein wrote: On Mon, 7 Jan 2008, Thiemo Nagel wrote: What you call pathologic cases when it comes to real-world data are very common. It is not at all unusual to find sectors filled with only a constant (usually zero, but not always), in which case your **512 becomes **1. Of course it would be easy to check how many of the 512 Bytes are really different on a case-by-case basis and correct the exponent accordingly, and only perform the recovery when the corrected probability of introducing an error is sufficiently low. What is the alternative to recovery, really? Just erroring out and letting the admin deal with it, blindly assume that the parity is wrong? Currently, 'repair' does blind recalculation of parity. The only benefit of that is (correct me if I'm wrong) to ascertain repeated reads return identical data. The last time I checked, there was not even a warning message. Kind regards, Thiemo Nagel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, can't get the second disk added back in.
Neil Brown wrote: On Saturday January 5, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:~# mdadm /dev/md0 --add /dev/hdb5 mdadm: Cannot open /dev/hdb5: Device or resource busy All the solutions I've been able to google fail with the busy. There is nothing that I can find that might be using /dev/hdb5 except the raid device and it appears it's not either. Very odd. But something must be using it. What does ls -l /sys/block/hdb/hdb5/holders show? What about cat /proc/mounts cat /proc/swaps lsof /dev/hdb5 ?? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I agree but for the life of me I can't figure out what, other than some raid daemon. [EMAIL PROTECTED]:~# ls -l /sys/block/hdb/hdb5/holders total 0 [EMAIL PROTECTED]:~# cat /proc/mounts none /sys sysfs rw,nosuid,nodev,noexec 0 0 none /proc proc rw,nosuid,nodev,noexec 0 0 udev /dev tmpfs rw 0 0 /dev/disk/by-uuid/4f67dae8-cdcb-460e-86cd-a5f0e4009422 / ext3 rw,data=ordered 0 0 /dev/disk/by-uuid/4f67dae8-cdcb-460e-86cd-a5f0e4009422 /dev/.static/dev ext3 rw,data=ordered 0 0 tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /dev/shm tmpfs rw 0 0 devpts /dev/pts devpts rw 0 0 usbfs /dev/bus/usb/.usbfs usbfs rw 0 0 udev /proc/bus/usb tmpfs rw 0 0 usbfs /proc/bus/usb/.usbfs usbfs rw 0 0 tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0 tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0 /dev/sda5 /home ext3 rw,data=ordered 0 0 /dev/md0 /backupmirror ext3 rw,data=ordered 0 0 /dev/hda1 /vz ext3 rw,data=ordered 0 0 /vz/private/225 /vz/root/225 simfs rw 0 0 /vz/private/300 /vz/root/300 simfs rw 0 0 /proc /vz/root/225/proc proc rw 0 0 /sys /vz/root/225/sys sysfs rw 0 0 none /vz/root/225/dev/pts devpts rw 0 0 /proc /vz/root/300/proc proc rw 0 0 /sys /vz/root/300/sys sysfs rw 0 0 none /vz/root/300/dev/pts devpts rw 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0 [EMAIL PROTECTED]:~# cat /proc/mounts | grep hdb [EMAIL PROTECTED]:~# cat /proc/swaps [EMAIL PROTECTED]:~# lsof /dev/hdb5 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, can't get the second disk added back in.
Neil Brown wrote: On Saturday January 5, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:~# mdadm /dev/md0 --add /dev/hdb5 mdadm: Cannot open /dev/hdb5: Device or resource busy All the solutions I've been able to google fail with the busy. There is nothing that I can find that might be using /dev/hdb5 except the raid device and it appears it's not either. Very odd. But something must be using it. What does ls -l /sys/block/hdb/hdb5/holders show? What about cat /proc/mounts cat /proc/swaps lsof /dev/hdb5 ?? NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Problem is not raid, or at least not obviously raid related. The problem is that the whole disk, /dev/hdb is unavailable. [EMAIL PROTECTED]:~# for i in /dev/hdb? do mount $i /mnt done mount: /dev/hdb1 already mounted or /mnt busy mount: /dev/hdb2 already mounted or /mnt busy mount: /dev/hdb3 already mounted or /mnt busy mount: /dev/hdb4 already mounted or /mnt busy mount: /dev/hdb5 already mounted or /mnt busy mount: /dev/hdb6 already mounted or /mnt busy [EMAIL PROTECTED]:~# mount /dev/hda1 /mnt I can fdisk it but none of the partitons are available. Knoppix can access it normally so it's not a hardware issue. No funny messages in the syslog. So I'll go off and stop harassing this list. ;) Thanks, Jim. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid 1, new disk can't be added after replacing faulty disk
I'm experiencing trouble when trying to add a new disk to a raid 1 array after having replaced a faulty disk. A few details about my configuration: # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1 : active raid1 sdb3[1] 151388452 blocks super 1.0 [2/1] [_U] md0 : active raid1 sdb2[1] 3911816 blocks super 1.0 [2/1] [_U] unused devices: none # uname -a Linux i.ines.ro 2.6.23.8-63.fc8 #1 SMP Wed Nov 21 18:51:08 EST 2007 i686 i686 i386 GNU/Linux # mdadm --version mdadm - v2.6.2 - 21st May 2007 So the story is this: disk sda failed and was physically replaced with a new one. The new disk is identical and was partitioned exactly the same way (as the old one and sdb). Getting sda2 (from the fresh empty disk) to the array does not work. This is what happens: # mdadm /dev/md0 -a /dev/sda2 mdadm: add new device failed for /dev/sda2 as 2: Invalid argument Kernel messages follow: md: sda2 does not have a valid v1.0 superblock, not importing! md: md_import_device returned -22 It's obvious that sda2 does not have a superblock (at all) since it's a fresh empty disk. But I expected mdadm to create the superblock and start rebuilding the array immediately. However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to 2.5.4 and it works like a charm. If you reply, please add me to cc - I am not subscribed to the list. Should I provide you further details or any kind of assistance for testing, please let me know. Thanks, Radu Rendec - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Is that normal a removed part in RAID0 still showed as active sync
The /dev/md0 is set as RAID0 cat /proc/mdstat shows md0 : active raid0 sda1[0] sdd1[3] sdc1[2] sdb1[1] 157307904 blocks 64k chunks Then sdd is removed. But cat /proc/mdsta still shows the same information as above, while two RAID5 devices show their sdd parts as (F) md0 : active raid0 sda1[0] sdd1[3] sdc1[2] sdb1[1] 157307904 blocks 64k chunks Is this normal? Also, when using mdadm --detail sdd1( part of RAID0) is showed as active sync, but sdd2(which is part of RAID5) is showed as removed Thank you. -- View this message in context: http://www.nabble.com/Is-that-normal-a-removed-part-in-RAID0-still-showed-as-%22active-sync%22-tp14670113p14670113.html Sent from the linux-raid mailing list archive at Nabble.com. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: On the subject of RAID-6 corruption recovery
Mattias Wadenstein wrote: On Mon, 7 Jan 2008, Thiemo Nagel wrote: What you call pathologic cases when it comes to real-world data are very common. It is not at all unusual to find sectors filled with only a constant (usually zero, but not always), in which case your **512 becomes **1. Of course it would be easy to check how many of the 512 Bytes are really different on a case-by-case basis and correct the exponent accordingly, and only perform the recovery when the corrected probability of introducing an error is sufficiently low. What is the alternative to recovery, really? Just erroring out and letting the admin deal with it, or blindly assume that the parity is wrong? Erroring out. Only thing to do at that point. -hpa - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, new disk can't be added after replacing faulty disk
On Jan 7, 2008 6:44 AM, Radu Rendec [EMAIL PROTECTED] wrote: I'm experiencing trouble when trying to add a new disk to a raid 1 array after having replaced a faulty disk. [..] # mdadm --version mdadm - v2.6.2 - 21st May 2007 [..] However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to 2.5.4 and it works like a charm. Looks like you are running into the issue described here: http://marc.info/?l=linux-raidm=119892098129022w=2 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, can't get the second disk added back in.
On Monday January 7, [EMAIL PROTECTED] wrote: Problem is not raid, or at least not obviously raid related. The problem is that the whole disk, /dev/hdb is unavailable. Maybe check /sys/block/hdb/holders ? lsof /dev/hdb ? good luck :-) NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why mdadm --monitor --program sometimes only gives 2 command-line arguments to the program?
On Saturday January 5, [EMAIL PROTECTED] wrote: Hi all, I need to monitor my RAID and if it fails, I'd like to call my-script to deal with the failure. I did: mdadm --monitor --program my-script --delay 60 /dev/md1 And then, I simulate a failure with mdadm --manage --set-faulty /dev/md1 /dev/sda2 mdadm /dev/md1 --remove /dev/sda2 I hope the mdadm monitor function can pass all three command-line arguments to my-script, including the name of the event, the name of the md device and the name of a related device if relevant. But my-script doesn't get the third one, which should be /dev/sda2. Is this not relevant? If I really need to know it's /dev/sda2 that fails, what can I do? What version of mdadm are you using? I'm guessing 2.6, 2.6.1, or 2.6.2. There was a bug introduced in 2.6 that was fixed in 2.6.3 that would have this effect. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid 1, new disk can't be added after replacing faulty disk
On Monday January 7, [EMAIL PROTECTED] wrote: On Jan 7, 2008 6:44 AM, Radu Rendec [EMAIL PROTECTED] wrote: I'm experiencing trouble when trying to add a new disk to a raid 1 array after having replaced a faulty disk. [..] # mdadm --version mdadm - v2.6.2 - 21st May 2007 [..] However, this happens with both mdadm 2.6.2 and 2.6.4. I downgraded to 2.5.4 and it works like a charm. Looks like you are running into the issue described here: http://marc.info/?l=linux-raidm=119892098129022w=2 I cannot easily reproduce this. I suspect it is sensitive to the exact size of the devices involved. Please test this patch and see if it fixes the problem. If not, please tell me the exact sizes of the partition being used (e.g. cat /proc/partitions) and I will try harder to reproduce it. Thanks, NeilBrown diff --git a/super1.c b/super1.c index 2b096d3..9eec460 100644 --- a/super1.c +++ b/super1.c @@ -903,7 +903,7 @@ static int write_init_super1(struct supertype *st, void *sbv, * for a bitmap. */ array_size = __le64_to_cpu(sb-size); - /* work out how much space we left of a bitmap */ + /* work out how much space we left for a bitmap */ bm_space = choose_bm_space(array_size); switch(st-minor_version) { @@ -913,6 +913,8 @@ static int write_init_super1(struct supertype *st, void *sbv, sb_offset = ~(4*2-1); sb-super_offset = __cpu_to_le64(sb_offset); sb-data_offset = __cpu_to_le64(0); + if (sb_offset - bm_space array_size) + bm_space = sb_offset - array_size; sb-data_size = __cpu_to_le64(sb_offset - bm_space); break; case 1: - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html