Re: raid problem
Le 30/11/2013 06:39, Stan Hoeppner a écrit : On 11/29/2013 4:43 PM, François Patte wrote: Bonsoir, I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in raid1 arrays. One disk (sdc) failed and I replaced it by a new one. Copying the partition table from sdd disk using sfdisk: sfdisk -d /dev/sdd | sfdisk /dev/sdc then I added the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1: mdadm --add /dev/md0 /dev/sdc1 mdadm --add /dev/md1 /dev/sdc3 There were no problem with the md0 array: cat /proc/mdstat gives: md0 : active raid1 sdc1[1] sdd1[0] 1052160 blocks [2/2] [UU] But for the md1 array, I get: md1 : active raid1 sdc3[2](S) sdd3[0] 483138688 blocks [2/1] [U_] And mdadm --detail /dev/md1 returns: /dev/md1: Version : 0.90 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Array Size : 483138688 (460.76 GiB 494.73 GB) Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Nov 29 21:23:25 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Events : 0.72076 Number Major Minor RaidDevice State 0 8 510 active sync /dev/sdd3 2 002 removed 2 8 35- spare /dev/sdc3 While mdadm --examine /dev/sdc3 returns: /dev/sdc3: Magic : a92b4efc Version : 0.90.00 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Array Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Nov 29 23:03:41 2013 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Checksum : be8bd27f - correct Events : 72078 Number Major Minor RaidDevice State this 2 8 352 spare /dev/sdc3 0 0 8 510 active sync /dev/sdd3 1 1 001 faulty removed 2 2 8 352 spare /dev/sdc3 What is the problem? And how can I recover a correct md1 array? IIRC Linux md rebuilds multiple degraded arrays sequentially, not in parallel. This is due to system performance impact and other reasons. When the rebuild of md0 is finished, the rebuild of md1/sdc3 should start automatically. If this did not occur please let us know and we'll go from there. I thought it was clear enough that the result of commands mdadm --details or mdadm --examine were what they return *after* the rebuild of array md1. On reboot, I am warned that md1 is started with one disk out of two and one spare and recovery starts immediately: md1 : active raid1 sdd3[0] sdc3[2] 483138688 blocks [2/1] [U_] [=...] recovery = 7.5% (36521408/483138688) finish=89.1min speed=83445K/sec After that, the situation is what is quoted in my previous message Regards. -- François Patte UFR de mathématiques et informatique Laboratoire CNRS MAP5, UMR 8145 Université Paris Descartes 45, rue des Saints Pères F-75270 Paris Cedex 06 Tél. +33 (0)1 8394 5849 http://www.math-info.univ-paris5.fr/~patte signature.asc Description: OpenPGP digital signature
Re: raid problem
On 2013-11-29 23:43 +0100, François Patte wrote: I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in raid1 arrays. One disk (sdc) failed and I replaced it by a new one. Copying the partition table from sdd disk using sfdisk: sfdisk -d /dev/sdd | sfdisk /dev/sdc then I added the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1: mdadm --add /dev/md0 /dev/sdc1 mdadm --add /dev/md1 /dev/sdc3 There were no problem with the md0 array: cat /proc/mdstat gives: md0 : active raid1 sdc1[1] sdd1[0] 1052160 blocks [2/2] [UU] But for the md1 array, I get: md1 : active raid1 sdc3[2](S) sdd3[0] 483138688 blocks [2/1] [U_] What is the problem? And how can I recover a correct md1 array? The root of your problem would be that /dev/sdc3 is considered spare, not active. Not sure why. Guess #1 : before physically changing the disks, you forgot mdadm /dev/md1 --fail /dev/sdc3 mdadm /dev/md1 --remove /dev/sdc3 Guess #2 : maybe there were I/O errors during the add. How far did the sync go ? Run smartctl -d ata -A /dev/sdc3 and look for non-zero raw values for Reallocated_Sector_Ct and Current_Pending_Sector. What does badblocks /dev/sdc3 say ? Guess #3 : it's a software hiccup and all /dev/sdc3 needs is to be removed from /dev/md1 and re-added. -- André Majorel http://www.teaser.fr/~amajorel/ bugs.debian.org, a spammer's favourite. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131130115635.gc26...@aym.net2.nerim.net
Re: raid problem
Le 30/11/2013 12:56, Andre Majorel a écrit : On 2013-11-29 23:43 +0100, François Patte wrote: I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in raid1 arrays. One disk (sdc) failed and I replaced it by a new one. Copying the partition table from sdd disk using sfdisk: sfdisk -d /dev/sdd | sfdisk /dev/sdc then I added the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1: mdadm --add /dev/md0 /dev/sdc1 mdadm --add /dev/md1 /dev/sdc3 There were no problem with the md0 array: cat /proc/mdstat gives: md0 : active raid1 sdc1[1] sdd1[0] 1052160 blocks [2/2] [UU] But for the md1 array, I get: md1 : active raid1 sdc3[2](S) sdd3[0] 483138688 blocks [2/1] [U_] What is the problem? And how can I recover a correct md1 array? The root of your problem would be that /dev/sdc3 is considered spare, not active. Not sure why. Thank you for answering Guess #1 : before physically changing the disks, you forgot mdadm /dev/md1 --fail /dev/sdc3 mdadm /dev/md1 --remove /dev/sdc3 No, I didn't! Guess #2 : maybe there were I/O errors during the add. How far did the sync go ? Run smartctl -d ata -A /dev/sdc3 and look for non-zero raw values for Reallocated_Sector_Ct and Current_Pending_Sector. What does badblocks /dev/sdc3 say ? No non-zero values for these two... no badblocks on sdc3 Guess #3 : it's a software hiccup and all /dev/sdc3 needs is to be removed from /dev/md1 and re-added. I tried without any success... But something is strange: there are some badblocks on sdd3! logwatch returs errors on sdd disk: md/raid1:md1: sdd: unrecoverable I/O read error for block 834749 ...: 3 Time(s) res 41/40:00:6f:56:61/00:00:32:00:00/40 Emask 0x409 (media error) F ...: 24 Time(s) sd 5:0:0:0: [sdd] Add. Sense: Unrecovered read error - auto reallocat ...: 6 Time(s) sd 5:0:0:0: [sdd] Sense Key : Medium Error [current] [descr ...: 6 Time(s) mdmonitor returns: This is an automatically generated mail message from mdadm running on dipankar A FailSpare event had been detected on md device /dev/md1. It could be related to component device /dev/sdc3. If I summarize the situation: the faulty disk (with badblocks) is sdd3, but it is the only active disk in the md1 array and I can fully access the data of this disk which is normally mounted at boot time, while the disk sdc3 has no badblocks and is declared as faulty by mdadm!! I don't understand something! Anyway. I can delete this array and create a new one from scratch (after replacing the faulty disk). Is it enough to run these commands: mdadm --zero-superblock /dev/sdc3 mdadm --zero-superblock /dev/sdd3 Or do I have also to modify the /etc/mdadm/mdadm.conf file? Thank you for your answer. -- François Patte UFR de mathématiques et informatique Laboratoire CNRS MAP5, UMR 8145 Université Paris Descartes 45, rue des Saints Pères F-75270 Paris Cedex 06 Tél. +33 (0)1 8394 5849 http://www.math-info.univ-paris5.fr/~patte signature.asc Description: OpenPGP digital signature
Re: raid problem
On 2013-11-30 19:48 +0100, François Patte wrote: If I summarize the situation: the faulty disk (with badblocks) is sdd3, but it is the only active disk in the md1 array and I can fully access the data of this disk which is normally mounted at boot time, while the disk sdc3 has no badblocks and is declared as faulty by mdadm!! To me it sounds like mdadm is confused and thinks the old sdc3 is still here. When you add the new sdc3, mdadm views it as a third device. Which is one more device than md1 was created with, so the new sdc3 is marked as spare. I know the output of mdadm --detail /dev/md1 contradicts this but it's my best shot. I don't understand something! Anyway. I can delete this array and create a new one from scratch (after replacing the faulty disk). Yes, re-creating the array from scratch seems the next thing to try. Note that unless you first manually copy the contents of sdd3 on sdc3, you better create md1 in two steps (first with with just sdd3, then --add sdc3). Is it enough to run these commands: mdadm --zero-superblock /dev/sdc3 mdadm --zero-superblock /dev/sdd3 Or do I have also to modify the /etc/mdadm/mdadm.conf file? No idea, I don't use mdadm.conf. You may get better help from linux-r...@vger.kernel.org -- André Majorel http://www.teaser.fr/~amajorel/ Thanks to the Debian project for going to such lengths never to disclose the email addresses of their users. Think of all the spam we would get if they didn't ! -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131201070740.gf26...@aym.net2.nerim.net
raid problem
Bonsoir, I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in raid1 arrays. One disk (sdc) failed and I replaced it by a new one. Copying the partition table from sdd disk using sfdisk: sfdisk -d /dev/sdd | sfdisk /dev/sdc then I added the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1: mdadm --add /dev/md0 /dev/sdc1 mdadm --add /dev/md1 /dev/sdc3 There were no problem with the md0 array: cat /proc/mdstat gives: md0 : active raid1 sdc1[1] sdd1[0] 1052160 blocks [2/2] [UU] But for the md1 array, I get: md1 : active raid1 sdc3[2](S) sdd3[0] 483138688 blocks [2/1] [U_] And mdadm --detail /dev/md1 returns: /dev/md1: Version : 0.90 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Array Size : 483138688 (460.76 GiB 494.73 GB) Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Nov 29 21:23:25 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Events : 0.72076 Number Major Minor RaidDevice State 0 8 510 active sync /dev/sdd3 2 002 removed 2 8 35- spare /dev/sdc3 While mdadm --examine /dev/sdc3 returns: /dev/sdc3: Magic : a92b4efc Version : 0.90.00 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Array Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Nov 29 23:03:41 2013 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Checksum : be8bd27f - correct Events : 72078 Number Major Minor RaidDevice State this 2 8 352 spare /dev/sdc3 0 0 8 510 active sync /dev/sdd3 1 1 001 faulty removed 2 2 8 352 spare /dev/sdc3 What is the problem? And how can I recover a correct md1 array? Thank you. -- François Patte UFR de mathématiques et informatique Laboratoire CNRS MAP5, UMR 8145 Université Paris Descartes 45, rue des Saints Pères F-75270 Paris Cedex 06 Tél. +33 (0)1 8394 5849 http://www.math-info.univ-paris5.fr/~patte signature.asc Description: OpenPGP digital signature
Re: raid problem
On 11/29/2013 4:43 PM, François Patte wrote: Bonsoir, I have a problem with 2 raid arrays: I have 2 disks (sdc and sdd) in raid1 arrays. One disk (sdc) failed and I replaced it by a new one. Copying the partition table from sdd disk using sfdisk: sfdisk -d /dev/sdd | sfdisk /dev/sdc then I added the 2 partitions (sdc1 and sdc3) to the arrays md0 and md1: mdadm --add /dev/md0 /dev/sdc1 mdadm --add /dev/md1 /dev/sdc3 There were no problem with the md0 array: cat /proc/mdstat gives: md0 : active raid1 sdc1[1] sdd1[0] 1052160 blocks [2/2] [UU] But for the md1 array, I get: md1 : active raid1 sdc3[2](S) sdd3[0] 483138688 blocks [2/1] [U_] And mdadm --detail /dev/md1 returns: /dev/md1: Version : 0.90 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Array Size : 483138688 (460.76 GiB 494.73 GB) Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Nov 29 21:23:25 2013 State : clean, degraded Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Events : 0.72076 Number Major Minor RaidDevice State 0 8 510 active sync /dev/sdd3 2 002 removed 2 8 35- spare /dev/sdc3 While mdadm --examine /dev/sdc3 returns: /dev/sdc3: Magic : a92b4efc Version : 0.90.00 UUID : 2e8294de:9b0d8d96:680a5413:2aac5c13 Creation Time : Sat Mar 7 11:48:30 2009 Raid Level : raid1 Used Dev Size : 483138688 (460.76 GiB 494.73 GB) Array Size : 483138688 (460.76 GiB 494.73 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Nov 29 23:03:41 2013 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 1 Spare Devices : 1 Checksum : be8bd27f - correct Events : 72078 Number Major Minor RaidDevice State this 2 8 352 spare /dev/sdc3 0 0 8 510 active sync /dev/sdd3 1 1 001 faulty removed 2 2 8 352 spare /dev/sdc3 What is the problem? And how can I recover a correct md1 array? IIRC Linux md rebuilds multiple degraded arrays sequentially, not in parallel. This is due to system performance impact and other reasons. When the rebuild of md0 is finished, the rebuild of md1/sdc3 should start automatically. If this did not occur please let us know and we'll go from there. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/529979f8.1050...@hardwarefreak.com
Re: Marvell SATA/Raid problem
On Saturday 23 June 2012 05:28:17 Camaleón wrote: Are there any known problems with Marvell SATA and Wheezy 64-bit? (...) None that I'm aware of :-? Anyway, something that was working fine in Squeeze is expected to be working in upcoming kernel versions. Unless you missed something, consider in opening a bug report for a possible regression. The Highpoint's problem persisted when I tried Squeeze 32-bit. So I think it's hardware. I'll ponder a bug report for that (low probability) as well as one for inotifywait (a separate thread). Monday, I replaced the Highpoint with a 'Best Connectivity' SIL3132-based 2- port RAID board and it's just like the farmer of song: it's working happily in the Dell. Boots and reboots with and without the drive in the dataport turned on, hot plug is working as expected, and my plug-n-play backup is nearly flawless (with a workaround for the inotifywait problem). N -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201206260132.07745.neal.p.mur...@alum.wpi.edu
Marvell SATA/Raid problem
Wheezy 64-bit. Marvell PCIE SATA/Raid card with one drive (in a CRU DP10), non-RAID. Main board has two identical Hitachi 1TB drives on on-board SATA ports used in md RAID. Running Squeeze 32-bit, it was handling hot-plugged drives just fine. Switched to Wheezy 64-bit and it no longer detects hot-plugged drives. In fact, it won't boot with a drive connected to the Marvell card (plugged into the DP10). Are there any known problems with Marvell SATA and Wheezy 64-bit? I have the system here; I'll re-try Wheezy 64-bit this weekend, and try Wheezy 32-bit and Squeeze. (If Squeeze 32-bit doesn't work, it'll be a good clue, since it worked before.) -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201206230304.16148.neal.p.mur...@alum.wpi.edu
Re: Marvell SATA/Raid problem
On Sat, 23 Jun 2012 03:04:16 -0400, Neal Murphy wrote: Wheezy 64-bit. Marvell PCIE SATA/Raid card with one drive (in a CRU DP10), non-RAID. Main board has two identical Hitachi 1TB drives on on-board SATA ports used in md RAID. So you have a total of 3 hard disks, one connected to the PCI-e card and the other two attached to the stock mainboard sata ports, right? Running Squeeze 32-bit, it was handling hot-plugged drives just fine. Switched to Wheezy 64-bit and it no longer detects hot-plugged drives. In fact, it won't boot with a drive connected to the Marvell card (plugged into the DP10). The change of the architecture (32 → 64 bits) seems to indicate that you did a clean install, from scratch, right? Questions: - Did the installer even see any of the 3 drives? - Output of lspci (so we can see the Marvell chipsets involved) - Output of messages (when you cannot boot, kernel logs, dmesg...) - What's your hdd layout? Where is the system installed, what raid level are you using? I guess is a raid 1 but I prefer to ask. Give more details about your system, you provided sparse data and precision is a must for these kind of problems :-) Are there any known problems with Marvell SATA and Wheezy 64-bit? (...) None that I'm aware of :-? Anyway, something that was working fine in Squeeze is expected to be working in upcoming kernel versions. Unless you missed something, consider in opening a bug report for a possible regression. Greetings, -- Camaleón -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/js427h$io1$3...@dough.gmane.org
Re: Can't reboot after power failure (RAID problem?)
On 11-01-31 8:47 PM, Andrew Reid wrote: The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. All's well. After the Real Sysadmin way got me into the system one-time-only, I could do the easy way which is more permanent without needing a rescue disk. Thank you so much. I have one more question, just out of curiousity so bottom priority. Why does this work? mdadm.conf is in the initramfs which is in /boot which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are assembled, which requires mdadm.conf. David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d49816b.1040...@alcor.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
In 4d49816b.1040...@alcor.concordia.ca, David Gaudine wrote: I have one more question, just out of curiousity so bottom priority. Why does this work? mdadm.conf is in the initramfs which is in /boot which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are assembled, which requires mdadm.conf. Finding the initramfs on disk and copying it into RAM is not actually done by the kernel. It is done by the boot loader, the same way the boot loader finds the kernel image on disk on copies it into RAM. As such, it doesn't use kernel features to load the initramfs. There are a number of techniques that boot loaders take to be able to do this magic. GRUB normally uses the gap between the partition table and the first partition to store enough modules to emulate the kernel's dm/md layer and one or more of the kernel's file system modules in order to do the loading. If those modules are not available or not in sync with how the kernel handles things, GRUB could fail to read the kernel image or initramfs or it could think it read both and transfer control to a kernel that is just random data from the disk. -- Boyd Stephen Smith Jr. ,= ,-_-. =. b...@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/\_/ signature.asc Description: This is a digitally signed message part.
Re: Can't reboot after power failure (RAID problem?)
Hello, dav...@alcor.concordia.ca a écrit : My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. The kernel and initramfs started, so grub did its job and does not seem to be the problem. Are the disks, partitions and RAID devices present in /proc/partitions and /dev/ ? What does /proc/mdstat contain ? Any related messages in dmesg ? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d47cb86.6030...@plouf.fr.eu.org
Re: Can't reboot after power failure (RAID problem?)
On 11-01-31 8:47 PM, Andrew Reid wrote: On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. I found the problem. You're right, mdadm.conf was the problem, which is amazing considering that I had previously restarted without changing mdadm.conf. I edited it in the initramfs, then did mdadm -A /dev/md0 as you suggested and control-d worked. I assume I'll still have to rebuild the initramfs; I might need handholding, but I'll google first. I think what went wrong might interest some people, since it answers a question I previously raised under the subject RAID1 with multiple partitions There was no concensus so I made the wrong choice. The cause of the problem is, I set up my system under a temporary hostname and then changed the hostname. The hostname appeared at the end of each ARRAY line in mdadm.conf, and I didn't know whether I should change it there because I didn't know if whether it has to match the current hostname in the current /etc/host, has to match the current hostname, or is just a meaningless label. I changed it to the new hostname at the same time that I changed the hostname, then shut down and restarted. It booted fine. I did the same thing on another computer, and I'm sure I restarted that one successfully several times. So, I foolishly thought I was safe. After the power failure it wouldn't boot. After following your advice I was sufficiently inspired to edit mdadm.conf back to the original hostname, mount my various md's, and control-d. I assume I'll have to do that every time I boot until I rebuild the initramfs. Thank you very much. I'd already recovered everything from a backup, but I needed to find the solution or I'd be afraid to raid in future. David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d4828e7.9030...@alcor.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
On Tue, Feb 1, 2011 at 10:38 AM, David Gaudine dav...@alcor.concordia.ca wrote: On 11-01-31 8:47 PM, Andrew Reid wrote: On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. I found the problem. You're right, mdadm.conf was the problem, which is amazing considering that I had previously restarted without changing mdadm.conf. I edited it in the initramfs, then did mdadm -A /dev/md0 as you suggested and control-d worked. I assume I'll still have to rebuild the initramfs; I might need handholding, but I'll google first. I think what went wrong might interest some people, since it answers a question I previously raised under the subject RAID1 with multiple partitions There was no concensus so I made the wrong choice. The cause of the problem is, I set up my system under a temporary hostname and then changed the hostname. The hostname appeared at the end of each ARRAY line in mdadm.conf, and I didn't know whether I should change it there because I didn't know if whether it has to match the current hostname in the current /etc/host, has to match the current hostname, or is just a meaningless label. I changed it to the new hostname at the same time that I changed the hostname, then shut down and restarted. It booted fine. I did the same thing on another computer, and I'm sure I restarted that one successfully several times. So, I foolishly thought I was safe. After the power failure it wouldn't boot. After following your advice I was sufficiently inspired to edit mdadm.conf back to the original hostname, mount my various md's, and control-d. I assume I'll have to do that every time I boot until I rebuild the initramfs. Thank you very much. I'd already recovered everything from a backup, but I needed to find the solution or I'd be afraid to raid in future. If you'd like to have homehost in mdadm.conf be the same as the hostname, you could break your boot in initramfs and assemble the array with mdadm --assemble /dev/mdX --homehost=whatever --update=homehost /dev/sdXX. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/aanlktimrinhk1bo+-6rj-vgzmqq-jesvbvfc8fhfg...@mail.gmail.com
Re: Can't reboot after power failure (RAID problem?)
I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. David Original Message Subject: Can't reboot after power failure (RAID problem?) From:dav...@alcor.concordia.ca Date:Mon, January 31, 2011 10:18 am To: debian-user@lists.debian.org -- My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. Ideas? David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4dddf59c3040d675d7baf7eceb707f83.squir...@webmail.concordia.ca
Can't reboot after power failure (RAID problem?)
My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. Ideas? David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/8f87441c861b61e6f6bf597949f5bdc0.squir...@webmail.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -u your-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. -- Andrew Reid / rei...@bellatlantic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201101312047.53519.rei...@bellatlantic.net
Debian Etch - LSI 8704ELP hardware raid problem.
Hi all, I have installed Etch on LSI 8704ELP HW RAID controller. Installation process went just fine but after reboot starting proccess freezed. If you have any suggestions I will be glad. Regards, Jarek -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Raid Problem
Nach der Neuinstallation von Sarge gibt es immer noch den Fehler mdadm --examine /dev/md0 mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got ) die beiden Platten starten aber als Array hoch. Komisch! -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Raid Problem
Hallo Liste, ich habe immer noch das Problem mit dem Raid1 das beim Systemneustart nur mit einem Laufwerk gestartet wird. Auf der Liste wurde folgendes vorgeschlagen, nützt aber nichts: mdadm --zero-superblock /dev/hda1 mdadm --zero-superblock /dev/hdc1 dpkg-reconfigure mdadm Kann mir jemand erklären, was das für ein Superblock ist, auch dd scheint das Problem nicht lösen zu können, wenn ich beide Platten spiegle. Gruss Dani fd-fhk-03657:~# mdadm --misc --detail /dev/md0 /dev/md0: Version : 00.90.01 Creation Time : Tue Oct 3 13:45:01 2006 Raid Level : raid1 Array Size : 242187776 (230.97 GiB 248.00 GB) Device Size : 242187776 (230.97 GiB 248.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Oct 16 18:13:53 2006 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : 876ebbcd:22d4e45a:885f088a:e77f9967 Events : 0.31136 Number Major Minor RaidDevice State 0 310 active sync /dev/hda1 1 2211 active sync /dev/hdc1 fd-fhk-03657:~# mdadm --examine /dev/md0 mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got ) fd-fhk-03657:~# dpkg-reconfigure mdadm Starting raid devices: done. Starting RAID monitor daemon: mdadm -F. fd-fhk-03657:~# mdadm --examine /dev/md0 mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got ) fd-fhk-03657:~# fsck /dev/md0 fd-fhk-03657:~# mdadm --examine /dev/md0 mdadm: No super block found on /dev/md0 (Expected magic a92b4efc, got )
RAID problem: data lost on RAID 6...
Hi, My Debian 3.1 (x86_64) system has suffered a very nasty mishap. First, the IOMMU code ran out of space to map I/O to the SATA drives. This looked to the md code like a faulty drive - so one by one, drives were marked 'failed', until three components had failed out of the seven-drive array, at which point it no longer functioned. After rebooting, I got five drives back into the array - enough for it to 'run' and be fscked. Almost recovered! Then, a genuine drive failure - lots of entries like this in syslog: *** end_request: I/O error, dev sde, sector 4057289 Buffer I/O error on device sde2, logical block 962111 ATA: abnormal status 0xD8 on port 0xC2010287 ATA: abnormal status 0xD8 on port 0xC2010287 ATA: abnormal status 0xD8 on port 0xC2010287 ata7: command 0x25 timeout, stat 0xd8 host_stat 0x1 ata7: translated ATA stat/err 0xd8/00 to SCSI SK/ASC/ASCQ 0xb/47/00 ata7: status=0xd8 { Busy } sd 6:0:0:0: SCSI error: return code = 0x802 sde: Current: sense key=0xb ASC=0x47 ASCQ=0x0 *** Of course, with two drives already (wrongly) marked 'failed', there's nothing to rebuild with... Is there a way I can 'unfail' another of the two drives and rebuild from that? Trying to 'assemble' the array just results in the other two being marked as spare components, then I'm told that 4 drives and 1 spare isn't enough to start a 7 drive array. James. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: SW Raid Problem
Klaus Zerwes schrieb: :-) das ist wohl nicht dein ernst http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html Diese howto basiert meinem Verständnis nach auf den Raidtools, welche man ja wohl nicht mehr benutzen sollte. Dort wird Raid 1 mit der Konfiguration in der /etc/raidtab wie folgt beschrieben: Set up the |/etc/raidtab| file like this: raiddev /dev/md0 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 device /dev/sdb6 raid-disk 0 device /dev/sdc5 raid-disk 1 Da ich aber mit mdadm arbeiten möchte, scheint mir diese Variante als unbrauchbar. Da aber laut man mdadm mdadm does not use /etc/raidtab, the raidtools configuration file, at all. It has a different configuration file with a different format and an different purpose. die /etc/raidtab nicht benötigt wird, kann ich mit dem Howto nicht alzuviel anfangen. Habe zwar andere Seiten gefunden, die sind aber immer nur zum Teil hilfreich. Gruß Torsten
Re: SW Raid Problem
Hallo Torsten, Am Fri, 23 Dec 2005 19:08:25 +0100 schrieb Torsten Geile: Da ich aber mit mdadm arbeiten möchte, scheint mir diese Variante als unbrauchbar. Da aber laut man mdadm So ging es mir auch. die /etc/raidtab nicht benötigt wird, kann ich mit dem Howto nicht alzuviel anfangen. Habe zwar andere Seiten gefunden, die sind aber immer nur zum Teil hilfreich. Vielleicht hilft Dir das Howto auf http://ralf-schmidt.de/privat/computer/raid1-lvm.htm HTH ciao Ralf Schmidt -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Re: SW Raid Problem
Torsten Geile wrote: Hi, Am Mon, 19 Dec 2005 12:00:47 +0100 schrieb Klaus Zerwes: Da hast du ein grundlegendes Problem: du hast Software-RAID nicht verstanden. Dateisysteme erzeugst du auf einem device (md0) und nicht auf den einzelnen Bestandteilen des devices. Doch, das war mir schon klar, aber auch vor der Erzeugung des Dateisystems gab es den gleichen Fehler, daher hatte ich es mal so versucht. Was mir allerdings nicht klar war ist die Tatsache, dass man erst ein raid mit dem missing device einrichten muss, um dann später die Daten rüberzukopieren. Beispiel: - dein raid soll vom typ 1 sein und die partitionen sda1 und sdb1 enthalten. - im Moment ist sda1 als normale Partition mit ext2 formatiert. - wenn du einfach ein raid im aktuellen Zustand über sda1 und sdb1 erzeugst und dann ein filesystem draufpackst sind deine Daten weg! - wenn du genug Speicherplatz besitzt, um deine Daten zwischenzulagern, dann gehst du wie folgt vor: - daten kopieren - sda1 umounten - raid1 erstellen mit beiden devices - dateisystem drauf - raid mounten - daten zurückspielen - wenn nicht mußt du den missin-trick machen - raid md0 mit sda1 als missing erzeugen und mkfs - md0 mounten und die daten draufkopieren - sda1 umounten und dem raid hinzufügen = sync - es gibt tasächlich die Möglichkeit ein nonraid direkt aufzuraiden - habe ich aber nie angewendet = daher halte ich mich zu dem Thema mal zurück. in allen fällen mußt du, wenn du autodetection haben möchtest den partitionstyp auf 0xFD setzten und persistent-superblock verwenden Wenn du nett zum Gockel bist und ihn mit den richtigen Begriffen fütterst kommst du auf die richtige Seite - zum Software Raid HowTo ;-) Gefüttert hatte ich schon, aber die dokus waren alle nicht so tiefgreifend. :-) das ist wohl nicht dein ernst http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html [...] Backup VOR dem Start ist selbstredend selbstverständlich.. Gruß Torsten -- Klaus Zerwes http://www.zero-sys.net -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Re: SW Raid Problem
Hi, Am Mon, 19 Dec 2005 12:00:47 +0100 schrieb Klaus Zerwes: Da hast du ein grundlegendes Problem: du hast Software-RAID nicht verstanden. Dateisysteme erzeugst du auf einem device (md0) und nicht auf den einzelnen Bestandteilen des devices. Doch, das war mir schon klar, aber auch vor der Erzeugung des Dateisystems gab es den gleichen Fehler, daher hatte ich es mal so versucht. Was mir allerdings nicht klar war ist die Tatsache, dass man erst ein raid mit dem missing device einrichten muss, um dann später die Daten rüberzukopieren. Wenn du nett zum Gockel bist und ihn mit den richtigen Begriffen fütterst kommst du auf die richtige Seite - zum Software Raid HowTo ;-) Gefüttert hatte ich schon, aber die dokus waren alle nicht so tiefgreifend. Nachträglich in etwa so: - auf der neuen Platte ein Raid mit missing device einrichten - die Daten von der alten Platte transferiren - testen ob das raiddevice beim booten initialisiert wird - (u.U. initrd bauen / kernel neu backen ...) - die alte Platte dem raid hinzufügen Backup VOR dem Start ist selbstredend selbstverständlich.. Gruß Torsten -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
SW Raid Problem
Hallo, habe nachträglich eine identische SATA HDD ins System eingebaut und möchte nun, dass die im RAID1 level arbeitet. Dazu habe ich die Partitionierung auf sdb blockgenau per fdsik der sda nachgebildet.Die Swap Partition habe ich rausgelassen. Mit mdadm --create --verbose /dev/md0 --level=1 --raid-device=2 /dev/sda8 /dev/sdb7 bekomme ich die Fehlermeldung mdadm: ADD_NEW_DISK for /dev/sda8 failed: Device or resource busy obwohl sda8 ( /home Verzeichnis ) nicht gemountet ist. sdb7 wurde per mkfs.ext3 erstellt. Wie genau richtige ich denn nun das SW raid1 ein? Gruß Torsten -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Re: SW Raid Problem
Torsten Geile wrote: Hallo, habe nachträglich eine identische SATA HDD ins System eingebaut und möchte nun, dass die im RAID1 level arbeitet. Dazu habe ich die Partitionierung auf sdb blockgenau per fdsik der sda nachgebildet.Die Swap Partition habe ich rausgelassen. Mit mdadm --create --verbose /dev/md0 --level=1 --raid-device=2 /dev/sda8 /dev/sdb7 bekomme ich die Fehlermeldung mdadm: ADD_NEW_DISK for /dev/sda8 failed: Device or resource busy obwohl sda8 ( /home Verzeichnis ) nicht gemountet ist. sdb7 wurde per mkfs.ext3 erstellt. Da hast du ein grundlegendes Problem: du hast Software-RAID nicht verstanden. Dateisysteme erzeugst du auf einem device (md0) und nicht auf den einzelnen Bestandteilen des devices. Wenn du nett zum Gockel bist und ihn mit den richtigen Begriffen fütterst kommst du auf die richtige Seite - zum Software Raid HowTo ;-) Das solltest du dir mal gründlich durchlesen. Und die Doku zu mdadmin Wie genau richtige ich denn nun das SW raid1 ein? Nachträglich in etwa so: - auf der neuen Platte ein Raid mit missing device einrichten - die Daten von der alten Platte transferiren - testen ob das raiddevice beim booten initialisiert wird - (u.U. initrd bauen / kernel neu backen ...) - die alte Platte dem raid hinzufügen Backup VOR dem Start ist selbstredend Gruß Torsten -- Klaus Zerwes http://www.zero-sys.net -- Haeufig gestellte Fragen und Antworten (FAQ): http://www.de.debian.org/debian-user-german-FAQ/ Zum AUSTRAGEN schicken Sie eine Mail an [EMAIL PROTECTED] mit dem Subject unsubscribe. Probleme? Mail an [EMAIL PROTECTED] (engl)
Software RAID problem - disks names change in case one fails
Hello, I'm testing a server before I put it in production, and I've got a problem with mdraid. The config: - Dell PowerEdge 800 - 4 x 250 Go SATA attached to the mobo - /boot 4 x 1 GB (1 GB available)in RAID1, 3 active + 1 spare - / 4 x 250 GB (500 GB available) in RAID5, 3 active + 1 spare No problems at install, and the server runs OK. Then I stop the server and remove /dev/sdb to simulate a hard disk failure that has caused a crash and a reboot. With the second disk removed the disks names are changed, the 3rd disk /dev/sdc becomes /dev/sdb and the 4th disk (that was the spare disk) /dev/sdd becomes /dev/sdc. During the boot process md detects that there is a problem, but then complains it can't find the /dev/sdd spare disk and the boot process stops with a kernel panic error. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Software RAID problem - disks names change in case one fails
On Fri, 9 Sep 2005 [EMAIL PROTECTED] wrote: I'm testing a server before I put it in production, and I've got a problem with mdraid. The config: - Dell PowerEdge 800 - 4 x 250 Go SATA attached to the mobo - /boot 4 x 1 GB (1 GB available)in RAID1, 3 active + 1 spare - / 4 x 250 GB (500 GB available) in RAID5, 3 active + 1 spare No problems at install, and the server runs OK. Then I stop the server and remove /dev/sdb to simulate a hard disk failure that has caused a crash and a reboot. purrfect test and do the same for each disk it is pointless to have 1 spare disk in the raid array - have you evern wondered about other folks that try to build a sata-based raid subsystem ?? - how did their sata pass the failed disk test if it reassigns its drive numbers upon reboot With the second disk removed the disks names are changed, exactly that is the problem with scsi - pull the power cord from it to simulate the disk failure or pull the sata cable ... - in either case, if the disk drives rename itself, based on who's alive, raid won't work to boot after the failed disk ( but it will stay running until its booted ) the 3rd disk /dev/sdc becomes /dev/sdb and the 4th disk (that was the spare disk) /dev/sdd becomes /dev/sdc. thta's always been true of scsi During the boot process md detects that there is a problem, but then complains it can't find the /dev/sdd spare disk and the boot process stops with a kernel panic error. exactly c ya alvin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Software RAID problem - disks names change in case one fails
- have you evern wondered about other folks that try to build a sata-based raid subsystem ?? - how did their sata pass the failed disk test if it reassigns its drive numbers upon reboot With the second disk removed the disks names are changed, exactly that is the problem with scsi - pull the power cord from it to simulate the disk failure or pull the sata cable ... - in either case, if the disk drives rename itself, based on who's alive, raid won't work to boot after the failed disk ( but it will stay running until its booted ) the 3rd disk /dev/sdc becomes /dev/sdb and the 4th disk (that was the spare disk) /dev/sdd becomes /dev/sdc. thta's always been true of scsi That's why devfs used names that wouldn't change, e.g. /dev/scsi/bus0/host0/target0/lun0/part1 Of course that's been deprecated in the newer 2.6 kernels and I don't know udev so I don't know if it has a similarly helpful naming scheme...anyone? signature.asc Description: Digital signature
Raid 10, mdadm, mdadm-raid problem
I've been having an enjoyable time tinkering with software raid with Sarge and the RC2 installer. The system boots fine with Raid 1 for /boot and Raid 5 for /. I decided to experiment with Raid 10 for /opt since there's nothing there to destroy :). Using mdadm to create a Raid 0 array from two Raid 1 arrays was simple enough, but getting the Raid 10 array activated at boot isn't working well. I used update-rc.d to add the symlinks to mdadm-raid using the defaults, but the Raid 10 array isn't assembled at boot time. After getting kicked to a root shell, if I check /proc/mdstat only md1 (/) is started. After running mdadm-raid start, md0 (/boot), md2, and md3 start. If I run mdadm-raid start again md4 (/opt) starts. Fsck'ing the newly assembled arrays before successfully issuing 'mount -a' shows no filesystem errors. I'm at a loss and haven't found any similar issue mentions on this list or the linux-raid list. Here's mdadm.conf: DEVICE partitions DEVICE /dev/md* ARRAY /dev/md4 level=raid0 num-devices=2 UUID=bf3456d3:2af15cc9:18d816bf:d630c183 devices=/dev/md2,/dev/md3 ARRAY /dev/md3 level=raid1 num-devices=2 UUID=a51da14e:41eb27ad:b6eefb94:21fcdc95 devices=/dev/sdb5,/dev/sde5 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ac25a75b:3437d397:c00f83a3:71ea45de devices=/dev/sda5,/dev/sdc5 ARRAY /dev/md1 level=raid5 num-devices=4 spares=1 UUID=efec4ae2:1e74d648:85582946:feb98f0c devices=/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sde3,/dev/sdd3 ARRAY /dev/md0 level=raid1 num-devices=4 spares=1 UUID=04209b62:6e46b584:06ec149f:97128bfb devices=/dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sde1,/dev/sdd1 Roger -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
RAID problem
Witam :) Nigdy wczesniej nie mialem do czynienia z raidem, a teraz przyszlo mi zainstalowac system na maszynie z raid5 Jakos Debian po wystartowaniu takiego urzadzenia nie widzi... Co z tym zrobic ? Czy przygotowac dyskietke startowa na innej maszynie ? Komputer do Dell PowerEdge 2500 -- KRZYZAK LC4 640 Czerwony Grejfrut :) Skierniewicehttp://www.krzyzak.motocykle.org
Re: RAID problem
On Thu, Oct 03, 2002 at 10:47:38AM +0200, Jacek Krzyzanowski wrote: Witam :) Nigdy wczesniej nie mialem do czynienia z raidem, a teraz przyszlo mi zainstalowac system na maszynie z raid5 Jakos Debian po wystartowaniu takiego urzadzenia nie widzi... Co z tym zrobic ? Czy przygotowac dyskietke startowa na innej maszynie ? Komputer do Dell PowerEdge 2500 Musisz mieć wkompilowane wsparcie dla kontrolerów raid - te montowane w dellach są zazwyczaj obsługiwane przez moduły megaraid, albo aacraid. Jeśli nie ma ich w kernelu instalacyjnym (ztcp. pierwszy jest, nawet w potato), to najprościej będzie dołożyć jeden dysk do zwykłego kontrolera scsi na płycie i na nim zainstalować system, a potem skompilować odpowiednie jądro i całość przerzucić na raida. pozdrawiam, rp. -- I WAS AT A PARTY, he added, a shade reproachfully. -- Death is summoned by the Wizards (Terry Pratchett, The Light Fantastic)
Re: RAID problem
Jezeli potato to spróbuj puscic instalke z CD2. Jest tam kernel który zawiera (chyba wszystkie) raidy a jak woody to musisz sprawdzic która płytka jest jeszcze bootująca i spróbowac z niej. Jak instalowalem potato to tak wlasnie robiłem pzw CB - Original Message - From: Robert Pyciarz [EMAIL PROTECTED] To: debian-user-list debian-user-polish@lists.debian.org Sent: Thursday, October 03, 2002 11:42 AM Subject: Re: RAID problem On Thu, Oct 03, 2002 at 10:47:38AM +0200, Jacek Krzyzanowski wrote: Witam :) Nigdy wczesniej nie mialem do czynienia z raidem, a teraz przyszlo mi zainstalowac system na maszynie z raid5 Jakos Debian po wystartowaniu takiego urzadzenia nie widzi... Co z tym zrobic ? Czy przygotowac dyskietke startowa na innej maszynie ? Komputer do Dell PowerEdge 2500 Musisz mieć wkompilowane wsparcie dla kontrolerów raid - te montowane w dellach są zazwyczaj obsługiwane przez moduły megaraid, albo aacraid. Jeśli nie ma ich w kernelu instalacyjnym (ztcp. pierwszy jest, nawet w potato), to najprościej będzie dołożyć jeden dysk do zwykłego kontrolera scsi na płycie i na nim zainstalować system, a potem skompilować odpowiednie jądro i całość przerzucić na raida. pozdrawiam, rp. -- I WAS AT A PARTY, he added, a shade reproachfully. -- Death is summoned by the Wizards (Terry Pratchett, The Light Fantastic) -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: RAID problem
On Thu, Oct 03, 2002 at 11:42:50AM +0200, Robert Pyciarz wrote: On Thu, Oct 03, 2002 at 10:47:38AM +0200, Jacek Krzyzanowski wrote: Nigdy wczesniej nie mialem do czynienia z raidem, a teraz przyszlo mi zainstalowac system na maszynie z raid5 Jakos Debian po wystartowaniu takiego urzadzenia nie widzi... Co z tym zrobic ? Czy przygotowac dyskietke startowa na innej maszynie ? Komputer do Dell PowerEdge 2500 Musisz mieć wkompilowane wsparcie dla kontrolerów raid - te montowane w dellach są zazwyczaj obsługiwane przez moduły megaraid, albo aacraid. Jeśli nie ma ich w kernelu instalacyjnym (ztcp. pierwszy jest, nawet w potato), to najprościej będzie dołożyć jeden dysk do zwykłego kontrolera scsi na płycie i na nim zainstalować system, a potem skompilować odpowiednie jądro i całość przerzucić na raida. Można też skompilować swoje jądro instalacyjne (config od niego jest na płytce w discs/) ze wsparciem dla raida i uruchomic instalacje albo via loadlin (jesli ma sie partycje vfat i dosa) albo z dyskietek (jedna to odpowiednio spreparowany root.bin (podmienia sie kernel), druga to rescue.bin. Chociaz tak prawde mowiac to nie wiem czy aby na etapie instalacji przy wyborze modulow nie ma przypadkiem modulow od raida - po ich zaladowaniu macierz powinna raczej byc dostepna. -- Marcin 'Szczepan|Hrw' Juszkiewicz mailto: marcinatamigadotpl my Debian packages: deb http://users.stone.pl/szczepan/ apt/
Re: RAID problem
On Thu, Oct 03, 2002 at 10:47:38AM +0200, Jacek Krzyzanowski wrote: Witam :) Nigdy wczesniej nie mialem do czynienia z raidem, a teraz przyszlo mi zainstalowac system na maszynie z raid5 Jakos Debian po wystartowaniu takiego urzadzenia nie widzi... Co z tym zrobic ? Czy przygotowac dyskietke startowa na innej maszynie ? A to jest macierz jakas sprzetowa, czy linux MD? Jesli MD, to aby kernel przy bootowaniu wykrywal macierz wystarczy przy uzyciu fdiska zmienic typ partycji w macierzy na 'fd' (hex) czyli 'linux raid autodetect' a wogole to polecam http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html. konik