confused raid1
I've inheritted responsibility for a server with a root raid1 that degrades every time the system is rebooted. It's a 2.4.x kernel. I've got both raidutils and mdadm available. The raid1 device is supposed to be /dev/hde1 /dev/hdg1 with /dev/hdc1 as a spare. I believe it was created with raidutils and the following portion of /etc/raidtab: raiddev /dev/md1 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 1 device /dev/hde1 raid-disk 0 device /dev/hdg1 raid-disk 1 device /dev/hdc1 spare-disk0 The output of mdadm -E concerns me though. # mdadm -E /dev/hdc1 /dev/hdc1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Thu Aug 11 08:38:59 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a4dddb8 - correct Events : 0.195 Number Major Minor RaidDevice State this 1 2211 active sync /dev/hdc1 0 0 3310 active sync /dev/hde1 1 1 2211 active sync /dev/hdc1 # mdadm -E /dev/hde1 /dev/hde1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Mon Aug 15 11:16:43 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a5348c9 - correct Events : 0.199 Number Major Minor RaidDevice State this 0 3310 active sync /dev/hde1 0 0 3310 active sync /dev/hde1 1 1 3411 active sync /dev/hdg1 # mdadm -E /dev/hdg1 /dev/hdg1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Mon Aug 15 11:16:43 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a5348cc - correct Events : 0.199 Number Major Minor RaidDevice State this 1 3411 active sync /dev/hdg1 0 0 3310 active sync /dev/hde1 1 1 3411 active sync /dev/hdg1 Shouldn't total devices be at least 2? How can failed devices be -1? When the system reboots, md1 becomes just /dev/hdc1. I've used mdadm to add hde1, fail and then remove hdc1, and add hdg1. How can I repair the array such that it will survive the next reboot and keep hde1 and hdg1 as the working devices? md1 : active raid1 hdg1[1] hde1[0] 30716160 blocks [2/2] [UU] -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: confused raid1
A few questions: a) what kernel version are you using? b) what mdadm version are you using? c) what messages conscerning the raid are in the log when its failing one of the drives and making hdc1 an active drive? d) what linux distribution (and version) are you using? Tyler. Jon Lewis wrote: I've inheritted responsibility for a server with a root raid1 that degrades every time the system is rebooted. It's a 2.4.x kernel. I've got both raidutils and mdadm available. The raid1 device is supposed to be /dev/hde1 /dev/hdg1 with /dev/hdc1 as a spare. I believe it was created with raidutils and the following portion of /etc/raidtab: raiddev /dev/md1 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 1 device /dev/hde1 raid-disk 0 device /dev/hdg1 raid-disk 1 device /dev/hdc1 spare-disk0 The output of mdadm -E concerns me though. # mdadm -E /dev/hdc1 /dev/hdc1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Thu Aug 11 08:38:59 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a4dddb8 - correct Events : 0.195 Number Major Minor RaidDevice State this 1 2211 active sync /dev/hdc1 0 0 3310 active sync /dev/hde1 1 1 2211 active sync /dev/hdc1 # mdadm -E /dev/hde1 /dev/hde1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Mon Aug 15 11:16:43 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a5348c9 - correct Events : 0.199 Number Major Minor RaidDevice State this 0 3310 active sync /dev/hde1 0 0 3310 active sync /dev/hde1 1 1 3411 active sync /dev/hdg1 # mdadm -E /dev/hdg1 /dev/hdg1: Magic : a92b4efc Version : 00.90.00 UUID : 8b65fa52:21176cc9:cbb74149:c418b5a4 Creation Time : Tue Jan 13 13:21:41 2004 Raid Level : raid1 Device Size : 30716160 (29.29 GiB 31.45 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Mon Aug 15 11:16:43 2005 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : -1 Spare Devices : 0 Checksum : 6a5348cc - correct Events : 0.199 Number Major Minor RaidDevice State this 1 3411 active sync /dev/hdg1 0 0 3310 active sync /dev/hde1 1 1 3411 active sync /dev/hdg1 Shouldn't total devices be at least 2? How can failed devices be -1? When the system reboots, md1 becomes just /dev/hdc1. I've used mdadm to add hde1, fail and then remove hdc1, and add hdg1. How can I repair the array such that it will survive the next reboot and keep hde1 and hdg1 as the working devices? md1 : active raid1 hdg1[1] hde1[0] 30716160 blocks [2/2] [UU] -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.338 / Virus Database: 267.10.9/72 - Release Date: 8/14/2005 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: confused raid1
On Mon, 15 Aug 2005, Mario 'BitKoenig' Holbe wrote: Well, reading the kernel boot messages could help. Perhaps, the hdc1 partition is type fd (raid autodetect) and the driver for hd[eg] is not in place when the RAID Autodetection is running. I should have included that. All 3 of them are type fd. -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: confused raid1
Well, i guess you won't know if you don't try. Do your other servers pronounce the same error in their logs upon bootup? regarding the module? Tyler. Jon Lewis wrote: On Mon, 15 Aug 2005, Tyler wrote: Try this suggestion (regarding modules.conf). https://www.redhat.com/archives/fedora-list/2003-December/msg05205.html I don't see why that modules.conf addition would be necessary / make a difference. I have other servers with root-raid1 that haven't needed that, and mkinitrd is smart enough (reads /etc/raidtab) to know that raid1 is needed and loads the raid1 module in the initrd linuxrc script. -- Jon Lewis | I route Senior Network Engineer | therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.338 / Virus Database: 267.10.9/72 - Release Date: 8/14/2005 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: confused raid1
On Monday August 15, [EMAIL PROTECTED] wrote: It's quite probable, that before the following reboot, md1 was hdc1 and hde1. Aug 9 02:02:39 kernel: md: created md1 Aug 9 02:02:39 kernel: md: bindhdc1,1 Aug 9 02:02:39 kernel: md: bindhde1,2 Aug 9 02:02:39 kernel: md: bindhdg1,3 Aug 9 02:02:39 kernel: md: running: hdg1hde1hdc1 Aug 9 02:02:39 kernel: md: hdg1's event counter: 00b0 Aug 9 02:02:39 kernel: md: hde1's event counter: 00b4 Aug 9 02:02:39 kernel: md: hdc1's event counter: 00b4 Aug 9 02:02:39 kernel: md: superblock update time inconsistency -- using the most recent one Aug 9 02:02:39 kernel: md: freshest: hde1 Aug 9 02:02:39 kernel: md: kicking non-fresh hdg1 from array! Aug 9 02:02:39 kernel: md: unbindhdg1,2 Aug 9 02:02:39 kernel: md: export_rdev(hdg1) Aug 9 02:02:39 kernel: md: RAID level 1 does not need chunksize! Continuing anyway. Aug 9 02:02:39 kernel: kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2 Aug 9 02:02:39 kernel: md: personality 3 is not loaded! Aug 9 02:02:39 kernel: md :do_md_run() returned -22 Aug 9 02:02:39 kernel: md: md1 stopped. Aug 9 02:02:39 kernel: md: unbindhde1,1 Aug 9 02:02:39 kernel: md: export_rdev(hde1) Aug 9 02:02:39 kernel: md: unbindhdc1,0 Aug 9 02:02:39 kernel: md: export_rdev(hdc1) Aug 9 02:02:39 kernel: md: ... autorun DONE. So md-personality-3 doesn't get loaded, and the array doesn't get started at all. i.e. the 'partition type FD' is not having any useful effect. So how does the array get started? Are there other message about md later in the kernel logs that talk about md1 ?? My guess is that 'raidstart' is being used to start the array somewhere along the line. 'raidstart' doesn't not start raid arrays reliably. Don't use it. Remove it from your system. It is unsafe. If you cannot get the raid1 module to be loaded properly, make sure that 'mdadm' is being used to assemble the array. It has a much better chance of getting it right. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html