Re: Can't reboot after power failure (RAID problem?)
On 11-01-31 8:47 PM, Andrew Reid wrote: The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. All's well. After the Real Sysadmin way got me into the system one-time-only, I could do the easy way which is more permanent without needing a rescue disk. Thank you so much. I have one more question, just out of curiousity so bottom priority. Why does this work? mdadm.conf is in the initramfs which is in /boot which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are assembled, which requires mdadm.conf. David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d49816b.1040...@alcor.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
In 4d49816b.1040...@alcor.concordia.ca, David Gaudine wrote: I have one more question, just out of curiousity so bottom priority. Why does this work? mdadm.conf is in the initramfs which is in /boot which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are assembled, which requires mdadm.conf. Finding the initramfs on disk and copying it into RAM is not actually done by the kernel. It is done by the boot loader, the same way the boot loader finds the kernel image on disk on copies it into RAM. As such, it doesn't use kernel features to load the initramfs. There are a number of techniques that boot loaders take to be able to do this magic. GRUB normally uses the gap between the partition table and the first partition to store enough modules to emulate the kernel's dm/md layer and one or more of the kernel's file system modules in order to do the loading. If those modules are not available or not in sync with how the kernel handles things, GRUB could fail to read the kernel image or initramfs or it could think it read both and transfer control to a kernel that is just random data from the disk. -- Boyd Stephen Smith Jr. ,= ,-_-. =. b...@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/\_/ signature.asc Description: This is a digitally signed message part.
Re: Can't reboot after power failure (RAID problem?)
Hello, dav...@alcor.concordia.ca a écrit : My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. The kernel and initramfs started, so grub did its job and does not seem to be the problem. Are the disks, partitions and RAID devices present in /proc/partitions and /dev/ ? What does /proc/mdstat contain ? Any related messages in dmesg ? -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d47cb86.6030...@plouf.fr.eu.org
Re: Can't reboot after power failure (RAID problem?)
On 11-01-31 8:47 PM, Andrew Reid wrote: On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. I found the problem. You're right, mdadm.conf was the problem, which is amazing considering that I had previously restarted without changing mdadm.conf. I edited it in the initramfs, then did mdadm -A /dev/md0 as you suggested and control-d worked. I assume I'll still have to rebuild the initramfs; I might need handholding, but I'll google first. I think what went wrong might interest some people, since it answers a question I previously raised under the subject RAID1 with multiple partitions There was no concensus so I made the wrong choice. The cause of the problem is, I set up my system under a temporary hostname and then changed the hostname. The hostname appeared at the end of each ARRAY line in mdadm.conf, and I didn't know whether I should change it there because I didn't know if whether it has to match the current hostname in the current /etc/host, has to match the current hostname, or is just a meaningless label. I changed it to the new hostname at the same time that I changed the hostname, then shut down and restarted. It booted fine. I did the same thing on another computer, and I'm sure I restarted that one successfully several times. So, I foolishly thought I was safe. After the power failure it wouldn't boot. After following your advice I was sufficiently inspired to edit mdadm.conf back to the original hostname, mount my various md's, and control-d. I assume I'll have to do that every time I boot until I rebuild the initramfs. Thank you very much. I'd already recovered everything from a backup, but I needed to find the solution or I'd be afraid to raid in future. David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4d4828e7.9030...@alcor.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
On Tue, Feb 1, 2011 at 10:38 AM, David Gaudine dav...@alcor.concordia.ca wrote: On 11-01-31 8:47 PM, Andrew Reid wrote: On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. I found the problem. You're right, mdadm.conf was the problem, which is amazing considering that I had previously restarted without changing mdadm.conf. I edited it in the initramfs, then did mdadm -A /dev/md0 as you suggested and control-d worked. I assume I'll still have to rebuild the initramfs; I might need handholding, but I'll google first. I think what went wrong might interest some people, since it answers a question I previously raised under the subject RAID1 with multiple partitions There was no concensus so I made the wrong choice. The cause of the problem is, I set up my system under a temporary hostname and then changed the hostname. The hostname appeared at the end of each ARRAY line in mdadm.conf, and I didn't know whether I should change it there because I didn't know if whether it has to match the current hostname in the current /etc/host, has to match the current hostname, or is just a meaningless label. I changed it to the new hostname at the same time that I changed the hostname, then shut down and restarted. It booted fine. I did the same thing on another computer, and I'm sure I restarted that one successfully several times. So, I foolishly thought I was safe. After the power failure it wouldn't boot. After following your advice I was sufficiently inspired to edit mdadm.conf back to the original hostname, mount my various md's, and control-d. I assume I'll have to do that every time I boot until I rebuild the initramfs. Thank you very much. I'd already recovered everything from a backup, but I needed to find the solution or I'd be afraid to raid in future. If you'd like to have homehost in mdadm.conf be the same as the hostname, you could break your boot in initramfs and assemble the array with mdadm --assemble /dev/mdX --homehost=whatever --update=homehost /dev/sdXX. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/aanlktimrinhk1bo+-6rj-vgzmqq-jesvbvfc8fhfg...@mail.gmail.com
Re: Can't reboot after power failure (RAID problem?)
I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. David Original Message Subject: Can't reboot after power failure (RAID problem?) From:dav...@alcor.concordia.ca Date:Mon, January 31, 2011 10:18 am To: debian-user@lists.debian.org -- My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. Ideas? David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4dddf59c3040d675d7baf7eceb707f83.squir...@webmail.concordia.ca
Can't reboot after power failure (RAID problem?)
My system went down because of a power failure, and now it won't start. I use RAID 1, and I don't know if that's related to the problem. The screen shows the following. Loading, please wait... Gave up waiting for rood device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay- (did the system wait long enough?) - Check root- (did the system wait for the right device? - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist. Dropping to a shell! I don't know if that uuid is an MD device, but it seem likely. Grub is installed on each disk, and I previously tested the RAID 1 arrays by unplugging each disk one at a time and was able to boot to either. Ideas? David -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/8f87441c861b61e6f6bf597949f5bdc0.squir...@webmail.concordia.ca
Re: Can't reboot after power failure (RAID problem?)
On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote: I posted in a panic and left out a lot of details. I'm using Squeeze, and set up the system about a month ago, so there have been some upgrades. I wonder if maybe the kernel or Grub was upgraded and I neglected to install Grub again, but I would expect it to automatically be reinstalled on at least the first disk. If I remove either disk I get the same error message. I did look at /proc/cmdline. It shows the same uuid for the root device as in the menu, so that seems to prove it's an MD device that isn't ready since my boot and root partitions are each on MD devices. /proc/modules does show md_mod. What about the actual device? Does /dev/md/0 (or /dev/md0, or whatever) exist? If the module is loaded but the device does not exist, then it's possible there's a problem with your mdadm.conf file, and the initramfs doesn't have the array info in it, so it wasn't started. The easy way out is to boot from a rescue disk, fix the mdadm.conf file, rebuild the initramfs, and reboot. The Real Sysadmin way is to start the array by hand from inside the initramfs. You want mdadm -A /dev/md0 (or possibly mdadm -A -u your-uuid) to start it, and once it's up, ctrl-d out of the initramfs and hope. The part I don't remember is whether or not this creates the symlinks in /dev/disk that your root-fs-finder is looking for. It may be better to boot with break=premount to get into the initramfs in a more controlled state, instead of trying to fix it in the already-error-ed state, assuming you try the initramfs thing at all. And further assuming that the mdadm.conf file is the problem, which was pretty much guesswork on my part... -- A. -- Andrew Reid / rei...@bellatlantic.net -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/201101312047.53519.rei...@bellatlantic.net