Re: Can't reboot after power failure (RAID problem?)

2011-02-02 Thread David Gaudine

On 11-01-31 8:47 PM, Andrew Reid wrote:


   The easy way out is to boot from a rescue disk, fix the mdadm.conf
file, rebuild the initramfs, and reboot.

   The Real Sysadmin way is to start the array by hand from inside
the initramfs.  You want mdadm -A /dev/md0 (or possibly
mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out
of the initramfs and hope.  The part I don't remember is whether or
not this creates the symlinks in /dev/disk that your root-fs-finder
is looking for.


All's well.  After the Real Sysadmin way got me into the system 
one-time-only, I could do the easy way which is more permanent without 
needing a rescue disk.  Thank you so much.


I have one more question, just out of curiousity so bottom priority.  
Why does this work?  mdadm.conf is in the initramfs which is in /boot 
which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are 
assembled, which requires mdadm.conf.


David


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4d49816b.1040...@alcor.concordia.ca



Re: Can't reboot after power failure (RAID problem?)

2011-02-02 Thread Boyd Stephen Smith Jr.
In 4d49816b.1040...@alcor.concordia.ca, David Gaudine wrote:
I have one more question, just out of curiousity so bottom priority.
Why does this work?  mdadm.conf is in the initramfs which is in /boot
which is on /dev/md0, but /dev/md0 doesn't exist until the arrays are
assembled, which requires mdadm.conf.

Finding the initramfs on disk and copying it into RAM is not actually done by 
the kernel.  It is done by the boot loader, the same way the boot loader finds 
the kernel image on disk on copies it into RAM.

As such, it doesn't use kernel features to load the initramfs.  There are a 
number of techniques that boot loaders take to be able to do this magic.  
GRUB normally uses the gap between the partition table and the first partition 
to store enough modules to emulate the kernel's dm/md layer and one or more of 
the kernel's file system modules in order to do the loading.  If those modules 
are not available or not in sync with how the kernel handles things, GRUB 
could fail to read the kernel image or initramfs or it could think it read 
both and transfer control to a kernel that is just random data from the 
disk.
-- 
Boyd Stephen Smith Jr.   ,= ,-_-. =.
b...@iguanasuicide.net   ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/\_/


signature.asc
Description: This is a digitally signed message part.


Re: Can't reboot after power failure (RAID problem?)

2011-02-01 Thread Pascal Hambourg
Hello,

dav...@alcor.concordia.ca a écrit :
 My system went down because of a power failure, and now it won't start.  I
 use RAID 1, and I don't know if that's related to the problem.  The screen
 shows the following.
 
 Loading, please wait...
 Gave up waiting for rood device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay- (did the system wait long enough?)
   - Check root- (did the system wait for the right device?
 - Missing modules (cat /proc/modules; ls /dev)
 ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist.
 Dropping to a shell!
 
 I don't know if that uuid is an MD device, but it seem likely.  Grub is
 installed on each disk, and I previously tested the RAID 1 arrays by
 unplugging each disk one at a time and was able to boot to either.

The kernel and initramfs started, so grub did its job and does not seem
to be the problem.

Are the disks, partitions and RAID devices present in /proc/partitions
and /dev/ ? What does /proc/mdstat contain ? Any related messages in dmesg ?


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d47cb86.6030...@plouf.fr.eu.org



Re: Can't reboot after power failure (RAID problem?)

2011-02-01 Thread David Gaudine

On 11-01-31 8:47 PM, Andrew Reid wrote:

On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote:

I posted in a panic and left out a lot of details.  I'm using Squeeze, and
set up the system about a month ago, so there have been some upgrades.  I
wonder if maybe the kernel or Grub was upgraded and I neglected to install
Grub again, but I would expect it to automatically be reinstalled on at
least the first disk.  If I remove either disk I get the same error
message.

I did look at /proc/cmdline.  It shows the same uuid for the root device
as in the menu, so that seems to prove it's an MD device that isn't ready
since my boot and root partitions are each on MD devices.  /proc/modules
does show md_mod.

   What about the actual device?  Does /dev/md/0 (or /dev/md0, or whatever)
exist?

   If the module is loaded but the device does not exist, then it's possible
there's a problem with your mdadm.conf file, and the initramfs doesn't
have the array info in it, so it wasn't started.

   The easy way out is to boot from a rescue disk, fix the mdadm.conf
file, rebuild the initramfs, and reboot.

   The Real Sysadmin way is to start the array by hand from inside
the initramfs.  You want mdadm -A /dev/md0 (or possibly
mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out
of the initramfs and hope.  The part I don't remember is whether or
not this creates the symlinks in /dev/disk that your root-fs-finder
is looking for.

   It may be better to boot with break=premount to get into the
initramfs in a more controlled state, instead of trying to fix it
in the already-error-ed state, assuming you try the initramfs
thing at all.

   And further assuming that the mdadm.conf file is the problem,
which was pretty much guesswork on my part...

-- A.


I found the problem.  You're right, mdadm.conf was the problem, which is 
amazing considering that I had previously restarted without changing 
mdadm.conf.  I edited it in the initramfs, then did mdadm -A /dev/md0 
as you suggested and control-d worked.  I assume I'll still have to 
rebuild the initramfs; I might need handholding, but I'll google first.


I think what went wrong might interest some people, since it answers a 
question I previously raised under the subject

RAID1 with multiple partitions
There was no concensus so I made the wrong choice.

The cause of the problem is, I set up my system under a temporary 
hostname and then changed the hostname.  The hostname appeared at the 
end of each ARRAY line in mdadm.conf, and I didn't know whether I should 
change it there because I didn't know if whether it has to match the 
current hostname in the current /etc/host, has to match the current 
hostname, or is just a meaningless label.  I changed it to the new 
hostname at the same time that I changed the hostname, then shut down 
and restarted.  It booted fine.  I did the same thing on another 
computer, and I'm sure I restarted that one successfully several times.  
So, I foolishly thought I was safe.  After the power failure it wouldn't 
boot.  After following your advice I was sufficiently inspired to edit 
mdadm.conf back to the original hostname, mount my various md's, and 
control-d.  I assume I'll have to do that every time I boot until I 
rebuild the initramfs.


Thank you very much.  I'd already recovered everything from a backup, 
but I needed to find the solution or I'd be afraid to raid in future.


David


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Archive: http://lists.debian.org/4d4828e7.9030...@alcor.concordia.ca



Re: Can't reboot after power failure (RAID problem?)

2011-02-01 Thread Tom H
On Tue, Feb 1, 2011 at 10:38 AM, David Gaudine
dav...@alcor.concordia.ca wrote:
 On 11-01-31 8:47 PM, Andrew Reid wrote:
 On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote:

 I posted in a panic and left out a lot of details.  I'm using Squeeze,
 and
 set up the system about a month ago, so there have been some upgrades.  I
 wonder if maybe the kernel or Grub was upgraded and I neglected to
 install
 Grub again, but I would expect it to automatically be reinstalled on at
 least the first disk.  If I remove either disk I get the same error
 message.

 I did look at /proc/cmdline.  It shows the same uuid for the root device
 as in the menu, so that seems to prove it's an MD device that isn't ready
 since my boot and root partitions are each on MD devices.  /proc/modules
 does show md_mod.

   What about the actual device?  Does /dev/md/0 (or /dev/md0, or whatever)
 exist?

   If the module is loaded but the device does not exist, then it's
 possible
 there's a problem with your mdadm.conf file, and the initramfs doesn't
 have the array info in it, so it wasn't started.

   The easy way out is to boot from a rescue disk, fix the mdadm.conf
 file, rebuild the initramfs, and reboot.

   The Real Sysadmin way is to start the array by hand from inside
 the initramfs.  You want mdadm -A /dev/md0 (or possibly
 mdadm -A -uyour-uuid) to start it, and once it's up, ctrl-d out
 of the initramfs and hope.  The part I don't remember is whether or
 not this creates the symlinks in /dev/disk that your root-fs-finder
 is looking for.

   It may be better to boot with break=premount to get into the
 initramfs in a more controlled state, instead of trying to fix it
 in the already-error-ed state, assuming you try the initramfs
 thing at all.

   And further assuming that the mdadm.conf file is the problem,
 which was pretty much guesswork on my part...

                                        -- A.

 I found the problem.  You're right, mdadm.conf was the problem, which is
 amazing considering that I had previously restarted without changing
 mdadm.conf.  I edited it in the initramfs, then did mdadm -A /dev/md0 as
 you suggested and control-d worked.  I assume I'll still have to rebuild the
 initramfs; I might need handholding, but I'll google first.

 I think what went wrong might interest some people, since it answers a
 question I previously raised under the subject
 RAID1 with multiple partitions
 There was no concensus so I made the wrong choice.

 The cause of the problem is, I set up my system under a temporary hostname
 and then changed the hostname.  The hostname appeared at the end of each
 ARRAY line in mdadm.conf, and I didn't know whether I should change it there
 because I didn't know if whether it has to match the current hostname in the
 current /etc/host, has to match the current hostname, or is just a
 meaningless label.  I changed it to the new hostname at the same time that I
 changed the hostname, then shut down and restarted.  It booted fine.  I did
 the same thing on another computer, and I'm sure I restarted that one
 successfully several times.  So, I foolishly thought I was safe.  After the
 power failure it wouldn't boot.  After following your advice I was
 sufficiently inspired to edit mdadm.conf back to the original hostname,
 mount my various md's, and control-d.  I assume I'll have to do that every
 time I boot until I rebuild the initramfs.

 Thank you very much.  I'd already recovered everything from a backup, but I
 needed to find the solution or I'd be afraid to raid in future.

If you'd like to have homehost in mdadm.conf be the same as the
hostname, you could break your boot in initramfs and assemble the
array with
mdadm --assemble /dev/mdX --homehost=whatever --update=homehost /dev/sdXX.


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/aanlktimrinhk1bo+-6rj-vgzmqq-jesvbvfc8fhfg...@mail.gmail.com



Re: Can't reboot after power failure (RAID problem?)

2011-01-31 Thread davidg
I posted in a panic and left out a lot of details.  I'm using Squeeze, and
set up the system about a month ago, so there have been some upgrades.  I
wonder if maybe the kernel or Grub was upgraded and I neglected to install
Grub again, but I would expect it to automatically be reinstalled on at
least the first disk.  If I remove either disk I get the same error
message.

I did look at /proc/cmdline.  It shows the same uuid for the root device
as in the menu, so that seems to prove it's an MD device that isn't ready
since my boot and root partitions are each on MD devices.  /proc/modules
does show md_mod.

David

 Original Message 
Subject: Can't reboot after power failure (RAID problem?)
From:dav...@alcor.concordia.ca
Date:Mon, January 31, 2011 10:18 am
To:  debian-user@lists.debian.org
--

My system went down because of a power failure, and now it won't start.  I
use RAID 1, and I don't know if that's related to the problem.  The screen
shows the following.

Loading, please wait...
Gave up waiting for rood device.  Common problems:
- Boot args (cat /proc/cmdline)
  - Check rootdelay- (did the system wait long enough?)
  - Check root- (did the system wait for the right device?
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist.
Dropping to a shell!

I don't know if that uuid is an MD device, but it seem likely.  Grub is
installed on each disk, and I previously tested the RAID 1 arrays by
unplugging each disk one at a time and was able to boot to either.

Ideas?

David




-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/4dddf59c3040d675d7baf7eceb707f83.squir...@webmail.concordia.ca



Can't reboot after power failure (RAID problem?)

2011-01-31 Thread davidg
My system went down because of a power failure, and now it won't start.  I
use RAID 1, and I don't know if that's related to the problem.  The screen
shows the following.

Loading, please wait...
Gave up waiting for rood device.  Common problems:
- Boot args (cat /proc/cmdline)
  - Check rootdelay- (did the system wait long enough?)
  - Check root- (did the system wait for the right device?
- Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/disk/by-uuid/47173345-34e3-4ab3-98b5-f39e80424191 does not exist.
Dropping to a shell!

I don't know if that uuid is an MD device, but it seem likely.  Grub is
installed on each disk, and I previously tested the RAID 1 arrays by
unplugging each disk one at a time and was able to boot to either.

Ideas?

David



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/8f87441c861b61e6f6bf597949f5bdc0.squir...@webmail.concordia.ca



Re: Can't reboot after power failure (RAID problem?)

2011-01-31 Thread Andrew Reid
On Monday 31 January 2011 10:51:04 dav...@alcor.concordia.ca wrote:
 I posted in a panic and left out a lot of details.  I'm using Squeeze, and
 set up the system about a month ago, so there have been some upgrades.  I
 wonder if maybe the kernel or Grub was upgraded and I neglected to install
 Grub again, but I would expect it to automatically be reinstalled on at
 least the first disk.  If I remove either disk I get the same error
 message.

 I did look at /proc/cmdline.  It shows the same uuid for the root device
 as in the menu, so that seems to prove it's an MD device that isn't ready
 since my boot and root partitions are each on MD devices.  /proc/modules
 does show md_mod.

  What about the actual device?  Does /dev/md/0 (or /dev/md0, or whatever)
exist?  

  If the module is loaded but the device does not exist, then it's possible
there's a problem with your mdadm.conf file, and the initramfs doesn't
have the array info in it, so it wasn't started.

  The easy way out is to boot from a rescue disk, fix the mdadm.conf
file, rebuild the initramfs, and reboot.

  The Real Sysadmin way is to start the array by hand from inside
the initramfs.  You want mdadm -A /dev/md0 (or possibly
mdadm -A -u your-uuid) to start it, and once it's up, ctrl-d out
of the initramfs and hope.  The part I don't remember is whether or
not this creates the symlinks in /dev/disk that your root-fs-finder
is looking for.

  It may be better to boot with break=premount to get into the 
initramfs in a more controlled state, instead of trying to fix it 
in the already-error-ed state, assuming you try the initramfs 
thing at all.

  And further assuming that the mdadm.conf file is the problem,
which was pretty much guesswork on my part...

-- A.  
-- 
Andrew Reid / rei...@bellatlantic.net


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/201101312047.53519.rei...@bellatlantic.net