Bug#594472: grub-pc: scary messages and very long boot time after upgrade
More observance; I upgraded udev to 161-1 from SID and did not pay attention while I was booting the faulty VM. So the VM waited on passphrase input. When I came back, entered the password, nothing happend and the VM was completely stuck and consuming 100% processor. I killed it and rebooted and swiftly entered the passphrasae; Then the boot continued as described above. To make it short, udev Version 161-1 make no difference did not help neither worsened the behaviour. Downgrading to 160-1 and check if 160-1 will show the same behaviour, if I do not enter the passphrase swiftly. Bingo, system went to Guru Meditation with udev 160-1 as well. Only if the passphrase is entered timely, booting will continue. Looking at CPU utilization, it is 100% until the the boot process actually starts after alls the sys/device/virtuals. This is even the case while Grub is waiting for input on the Boot menu, which I would not have expected. But maybe the latter is normal behaviour. In addition to the above, I added a spare drive to each md-device and tested failing disks and fail situations. These seemed to work all perfectly well. The spares kicked in and the mirror was restored as expected. Booting with one drive (either hda and hdb) worked fine as well, when I broke the other one before boot. BTW: I will be more than happy doing a few sensefull tests, if someone advises me of his needs. Cheers Darren -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: grub-pc: scary messages and very long boot time after upgrade
Below is the complete log of the boot with the kvm commandline i use, deleted by aprroximately 5K Lines, as noted inline between ### ###. Should the full log be of interest I am happy to send, just let me know. This is a vanilla install with the current alpha-installer, installation was as stated in my email before. It appears that there are three udev timeouts; the first two after 30seconds each and the last one after 10 seconds, see below. In addition, which I have just now spotted, is that on shutdown it fails to bring down the LVM and the early crypto disks. regards Darren --- r...@mypc:/root# kvm -hda /dev/mapper/m-luxm1 -hdb /dev/mapper/m-luxm2 -m 512 -serial stdio Loading, please wait... mdadm: /dev/md0 has been started with 2 drives. mdadm: /dev/md1 has been started with 2 drives. Volume group x not found Skipping volume group x Unable to find LVM volume x/root udevadm settle - timeout of 30 seconds reached, the event queue contains: /sys/devices/virtual/block/md0 (1690) /sys/devices/virtual/block/md0 (1692) /sys/devices/virtual/block/md0 (1694) /sys/devices/virtual/block/md0 (1696) /sys/devices/virtual/block/md0 (1698) /sys/devices/virtual/block/md0 (1700) /sys/devices/virtual/block/md0 (1703) /sys/devices/virtual/block/md0 (1704) /sys/devices/virtual/block/md0 (1706) /sys/devices/virtual/block/md0 (1708) ### 1140 Lines deleted ### /sys/devices/virtual/block/md1 (2901) /sys/devices/virtual/block/md0 (2902) /sys/devices/virtual/block/md1 (2903) /sys/devices/virtual/block/md0 (2904) /sys/devices/virtual/block/md1 (2905) /sys/devices/virtual/block/md0 (2906) /sys/devices/virtual/block/md1 (2907) /sys/devices/virtual/block/md0 (2908) /sys/devices/virtual/block/md1 (2909) /sys/devices/virtual/block/md0 (2910) /sys/devices/virtual/block/md1 (2911) /sys/devices/virtual/block/md0 (2912) Unlocking the disk /dev/disk/by-uuid/diskID (md1_crypt) Enter passphrase: 2 logical volume(s) in volume group x now active cryptsetup: md1_crypt set up successfully udevadm settle - timeout of 30 seconds reached, the event queue contains: /sys/devices/virtual/block/md0 (2894) /sys/devices/virtual/block/md0 (2896) /sys/devices/virtual/block/md0 (2898) /sys/devices/virtual/block/md0 (2900) /sys/devices/virtual/block/md0 (2902) /sys/devices/virtual/block/md0 (2904) /sys/devices/virtual/block/md0 (2906) /sys/devices/virtual/block/md0 (2908) ### 1950 Lines deleted ### /sys/devices/virtual/block/md0 (5118) /sys/devices/virtual/block/md1 (5119) /sys/devices/virtual/block/md0 (5120) /sys/devices/virtual/block/md1 (5121) /sys/devices/virtual/block/md0 (5122) /sys/devices/virtual/block/md1 (5123) udevadm settle - timeout of 10 seconds reached, the event queue contains: /sys/devices/virtual/block/md0 (3314) /sys/devices/virtual/block/md0 (3316) /sys/devices/virtual/block/md0 (3318) /sys/devices/virtual/block/md0 (3320) /sys/devices/virtual/block/md0 (3322) /sys/devices/virtual/block/md0 (3324) /sys/devices/virtual/block/md0 (3326) /sys/devices/virtual/block/md0 (3328) ### 1980 Lines deleted ### /sys/devices/virtual/block/md1 (5563) /sys/devices/virtual/block/md0 (5564) /sys/devices/virtual/block/md1 (5565) /sys/devices/virtual/block/md0 (5566) /sys/devices/virtual/block/md1 (5567) /sys/devices/virtual/block/md0 (5568) /sys/devices/virtual/block/md1 (5569) /sys/devices/virtual/block/md0 (5570) /sys/devices/virtual/block/md1 (5571) /sys/devices/virtual/block/md0 (5572) kinit: No resume image, doing normal boot... INIT: version 2.88 booting Using makefile-style concurrent boot in runlevel S. Starting the hotplug events dispatcher: udevd. Synthesizing the initial hotplug events...done. Waiting for /dev to be fully populated...done. Setting parameters of disc: (none). Generating udev events for MD arrays...done. Setting preliminary keymap...done. Checking root file system...fsck from util-linux-ng 2.17.2 /dev/mapper/x-root: clean, 76211/579360 files, 795464/2316288 blocks done. Starting early crypto disks...done. Cleaning up ifupdown Setting up networking Loading kernel modules...done. Setting up LVM Volume Groups Reading all physical volumes. This may take a while... Found volume group x using metadata type lvm2 2 logical volume(s) in volume group x now active . Starting remaining crypto disks...done. Activating lvm and md swap...done. Checking file systems...fsck from util-linux-ng 2.17.2 /dev/md0: clean, 226/60240 files, 44737/240832 blocks done. Mounting local filesystems...done. Activating swapfile swap...done. Cleaning up temporary files Configuring network interfaces...done. Setting kernel variables ...done. Starting portmap daemon Starting NFS common utilities: statd. Cleaning up temporary files Setting up ALSA...done (none loaded). Setting console
Bug#594472: grub-pc: scary messages and very long boot time after upgrade
So this looks a bit like there are two erros: 1) mdadm is broken in 3.1.2 and brings up these errormessages. 2) Grub2 seems to be broken, as it does not start crypto, consecutively failing to bring up LVM The only thing I dont understand is why it is only us two experiencing this. To test, I have set up a virtual machine with KVM, for that I created two lv's with 10G each and started kvm with kvm -hda /dev/mapper/testdisk1 -hdb /dev/mapper/testdisk2 -cdrom yesterdays alpha netinstall.iso then I partitioned both disks with two partitions partition 1 : 250MB partition 2 : all remaining with these two 'disks' I set up two raid1 sets md0 = (hda1, hdb1) md1 = (hda2, hdb2) md0 is used for /boot as ext3 md1 I encrypted giving md1_crypt md1_crypt is then used for LVM as volume group x which I have partitioned as follows x-swap 1GB swap space x-root all remaining for / as ext3 The installation process went all fine without any problems. (apart from some minor quirks that should be reported somewhere else) When rebooting I have the same behaviour as what you described first; The system complains about the missing vg and comes up with a lot of /sys/devices/virtual ... messages, then asks for the disk-passphrase, goes into a longer thinking period and after another block of /sys/devices/virtual/... messages the vm actually boots without any further problems. From the point complaining about the missing vg right at the beginning to the actual boot process starting, the vm consumes 150% of my processors. Some things are going in an endless loop there? I use the same architecture as you are: amd64 with a dual core processor. The VM is set up as amd64 as well. It does not make a difference if I run the vm with one or two processors (only that with one processor the vm consumes 100% of that processor while boot is failing, cant do more, can it? ;) ) Cheers Darren -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: grub-pc: scary messages and very long boot time after upgrade
thanks for the analysis. I'm under the impression that mdadm does assemble the raids, but then fails to recognize the partition table. But I don't know whether this is still in the realm of mdadm's responsibility, or whether some other package (linux-image-*?) should be responsible instead. Correct, it assembles the raids and moans about an unrecognized partition table. The latter I assume is normal, since mdo (in my case) is an ext3 partition and not a disk with a partition table, the same applies to md1, which is not anything that could be recognized as a valid disk, because it is completely encrypted. Agree it could be anything, even udev, where I found some references to. Nevertheless, MDADM 3.1.2 seems to worsen the problem, as does the current Grub2. When rebooting I have the same behaviour as what you described first; Now *this* sounds dangerous: Failing to reboot for a newly-installed system. This means that re-installing a box wouldn't fix the problem, but only recreate it. Correct, and it is replicable, which should allow someone with more technical knowledge than me to analyse in detail. Given the fact this happens on a VM makes me confident it is not Hardware related. I run on a laptop with two disks and an intel dual core, which is fairly different to your quad ;). My system's thinking period was ended by OOM-killer because something ate all my RAM. I already submitted messages which included tracebacks which I can't make too much sense of right now. Could not check this, I only gave 512M to the VM and the 'thinking period' was quite short, a good part of a minute I guess. I didn't dare to turn the machine off since it last took so long to boot, but (of course) would like to. I rebooted mine several times now. Only have to help with the busybox commands to get it up and running. That seems fairly stable ;) I can't inspect my system very well, but from the sound of the fans, the CPU must be sizzling hot. my fan does not quite go to loud mode, dunno why at 100% cpu utilisation I would expect it. On the otherhand the machine does not feel too hot either. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: grub-pc: scary messages and very long boot time after upgrade
On Sun, Aug 29, 2010 at 9:38 AM, Toni Mueller supp...@oeko.net wrote: Hi, On Sun, 29.08.2010 at 07:18:36 +0200, Dh H dhh4...@googlemail.com wrote: Downgrading MDADM helped for me, but not completely. The behaviour at the beginning is still the same, such that it does not find the volume-group. But now I am dropped into a shell instead of getting these /sys/devices/virtual/block/mdX messages. IOW, you need to manually intervene to get the machine to actually boot, but the whole process is now faster than waiting until the system figures it out on it's own, right? Actually it the system does not figure it out, as it does not find my root filesystem: I now get this while booting: mdadm: /dev/md0 has been started with 2 drives mdadm: /dev/md1 has been started with 2 drives Volume Group m not found skipping volume group m unable to find LVM volume m/root Gave up waiting for root device some unapplicable hints ;) ALERT! /dev/mapper/m-root does not exist. Dropping into shell Busybox (initramfs) In busybox I have to start the cryptodrive with cryptsetup luksOpen md1 md1_crypt and then start the vg/lv with lvm vgscan vgchange -a y when I exit from the shell the system boots as it finds the root device after the above BTW: I have upgraded grub from 1.98+20100804-2 to 1.98+20100804-4 before I downgraded mdadm. This did not help on itself. As to wheather this now creates the the dropping into busybox I cannot judge. -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: grub-pc: scary messages and very long boot time after upgrade
Downgrading MDADM helped for me, but not completely. The behaviour at the beginning is still the same, such that it does not find the volume-group. But now I am dropped into a shell instead of getting these /sys/devices/virtual/block/mdX messages. In that shell I can use lvm to bring the vg up and activate the lvs. from that point on booting works fine. Toni, I used this link to get mdadm for downgrade: http://snapshot.debian.org/package/mdadm/3.1.1-1/ -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: Some more information from me
Here is some more information from me, as I had not much time this morning: regardless of the passphrase I enter, any is accepted (also wrong ones!!!) and the behaviour after that is the same. Some I captured both standard and (recovery mode) below (please note that the below was copied using an eye to finger scanner) standard 2.6.32-5-amd64 starts with mdadm: /dev/md0 has been started with 2 drives. mdadm: /dev/md1 has been started with 2 drives. volume group m not found skipping volume group m Unable to find LVM volume m/root udevadm settle - timeout of 30 seconds reached, the event queue contains : /sys/devices/virtual/block/md0 () + more of the above Unlocking the disk /dev/disk/by-uuid/uuid (md1_crypt) Enter Passphrase: - From that time on the drive LED keeps flickering, even before entering the passphrase. I stopped the try after approx 18hrs of waiting, cause I thought it is maybe rebuilding the mirror be4 continueing. With the same kernel but using (recovery mode) the behaviour is slightly different: first I get normal boot messages for the first 15 seconds about CPU started USB started (cannot capture this using xon/xoff) [ timestamp] sd 0:0:0:0 Attached scsi generic sg0 type 0 [ timestamp] sd 0:0:0:0 Attached scsi generic sg1 type 0 [ timestamp] sd 0:0:0:0 Attached scsi generic sg2 type 5 Begin: Loading essential drivers ... done Begin: Running /scripts/init-premount done Begin: Mounting root file system . Begin: Running /scripts/local-top . Begin: Loading [ timestamp] md: raid1 personality registered for level 1 Success: loaded module raid1 done. Begin: Assembling all MD arrays [ timestamp] md: md0 stopped. [ timestamp] md: bind sdb1 [ timestamp] md: bind sda1 [ timestamp] raid1: raid set md0 active with 2 out of 2 mirrors [ timestamp] md0: detected capacity change from 0 to boot device size mdadm: /dev/md0 has been started with 2 drives [ timestamp] md0: unknown partition table [ timestamp] md: md1 stopped. [ timestamp] md: bind sdb2 [ timestamp] md: bind sda2 [ timestamp] raid1: raid set md0 active with 2 out of 2 mirrors [ timestamp] md0: detected capacity change from 0 to LVM device size mdadm: /dev/md1 has been started with 2 drives [ timestamp] md0: unknown partition table Success: Assembled all arrays [ timestamp] device-mapper: uevent: Version 1.0.3 [ timestamp] device-mapper: ioctl 4.15.0-ioctl (2009-04-01) initialised: dm-de...@redhat.com Volume Group m not found Skipping Volume group m Unable to find LVM volume m/root cryptsetup: vm device name (/dev/disk/by-uuid/uuid) does not begin with /dev/mapper cryptsetup: evms_activate is not available Begin: Waiting for encrypted source device . done udevadm settle - timeout of 30 seconds reached, the event queue contains /sys/devices/virtual/block/md0 () + more of the above Unlocking the disk /dev/disk/by-uuid/uuid (md1_crypt) Enter Passphrase: -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: How to get to your data and back up
I managed to get down to my data. Using the Debian NetInstall CD I booted into rescue mode Select language, territory and keyboard as applicable to you. Cancel network autoconfig, if that does not work for you (for me that is the case), and enter appropriate data. Select any hostname and domain System will then come up and ask for the passphrases of the single disks, do *not* enter your passphrase there, just continue. Select Do not use a root file system Select Execute a shell in the installer environment Then use the commands below (preceded by #) to bring your vg and lv's online: # cryptsetup luksOpen /dev/mdx vg-name Enter Passphrase: x Key Slot x unlocked. # vgscan Reading all physical volumes. Found volumegroup x . # vgchange -a y x logical volumes in volume group x now active # exit At this point the rescuemode menu reappears select choose a different root file system Select your root and execute a shell - voila you got your data (you may need to enter 'mount -a' in the shell to get all LV's mounted. At this point you can back up your data and reinstall - if you wanted. For me - I backed up and now try some grub2 settings. maybe I can get it back working. Will post if I have success. I would appreciate a little help with this -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#594472: No success yet but
when I add insmod crypto to the grub.cfg, my system goes directly to mode of flickering led. Booting does not work at all. Could it be that the culprit is within crypto.mod? -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org