Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-05-02 Thread Ben Hutchings
On Thu, 2015-04-30 at 09:41 +0200, Bernhard Schmidt wrote:
 Hi maximilian,
 
  [ copied from debian-user again ]
 
  ---
  Got another system with the symptoms and managed to get a snapshot.
 
  It is really extremely weird. The kernel output is
 
  List of all partitions:
  No filesystem could mount root, tried:
  Kernel panic - not syncing: VFS: Unable to mount root fs on
  unknown-block(0,0)
 
  This is reproducible. To fix it it is enough to boot into the Wheezy
  kernel (even with init=/bin/sh), then reboot. It apparently does
  something to the root-fs (fsck?) which allows the Jessie kernel to boot.
 
  I have asked our Windows guys to make a screencast, it is uploaded here.
 
  http://users.birkenwald.de/~berni/volatile/783620.mkv

I can see that the GRUB menu entry for 3.16.0-4-amd64 does seem to
include an initramfs.  Unfortunately the frame rate is quite low so I
don't see any messages from GRUB indicating whether it succeeded or
failed to load the file.

  We still have the snapshot available, if you have an idea please drop me
  a note.
  
  this means linux didn't get the initramfs passed by the bootloader.
  
  In the old days this happened when lilo was not run, these days it could
  be some grub modules out of sync (very wild guess).
  did you try before botting into that image to run install-grub in it?
 
 I don't have access to the snapshot until Monday, but I don't think it
 will help. As you can see in the video a simple fsck/mount in initrd in
 the old kernel is enough, and grub isn't touched there.

fsck.xfs does nothing (see the manual page).  Mounting the filesystem,
however, will replay any changes that were only written to the journal
and not yet written to their usual locations on disk.

Is it possible that this system was not cleanly shut down following the
upgrade?  I don't think that GRUB reads journals so this would probably
explain what you've shown.

Ben.

 But I'll test on Monday to be sure.
 
 Bernhard
 

-- 
Ben Hutchings
Q.  Which is the greater problem in the world today, ignorance or apathy?
A.  I don't know and I couldn't care less.


signature.asc
Description: This is a digitally signed message part


Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-05-02 Thread Bernhard Schmidt
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 02.05.2015 21:26, Ben Hutchings wrote:

Hi,
 
 I can see that the GRUB menu entry for 3.16.0-4-amd64 does seem to 
 include an initramfs.  Unfortunately the frame rate is quite low so
 I don't see any messages from GRUB indicating whether it succeeded
 or failed to load the file.

Even directly in front of the console I can't make out any error
message, and if I change the filename to something non-existant I get
a error message and have to press a key to continue.

 We still have the snapshot available, if you have an idea
 please drop me a note.
 
 this means linux didn't get the initramfs passed by the
 bootloader.
 
 In the old days this happened when lilo was not run, these days
 it could be some grub modules out of sync (very wild guess). 
 did you try before botting into that image to run install-grub
 in it?
 
 I don't have access to the snapshot until Monday, but I don't
 think it will help. As you can see in the video a simple
 fsck/mount in initrd in the old kernel is enough, and grub isn't
 touched there.
 
 fsck.xfs does nothing (see the manual page).  Mounting the
 filesystem, however, will replay any changes that were only written
 to the journal and not yet written to their usual locations on
 disk.

Should I be able to see that somehow (xfs_info from a rescue system or
something like that)? Doesn't xfs log anything when it replays the log
(I know ext4 does, but I don't recall seeing that for XFS ever)

 Is it possible that this system was not cleanly shut down following
 the upgrade?  I don't think that GRUB reads journals so this would
 probably explain what you've shown.

The system is normally rebooted using /sbin/reboot soon after
dist-upgrade is finished. There are no errors and our customizations
don't touch the reboot part at all. If there was a problem with
unclean shutdown it would be a common error, we are seeing ~20%
failure rates on upgrades.

If I understand you correctly since grub isn't erroring out on the
initrd filename it is likely there on disk, but an out-of-date
version. I recall the initrd being generated twice, so maybe the file
that is on the disk (read by grub) is somehow incomplete and the first
boot syncs the updated image from journal to disk. Or maybe it is
really unreadable ... other guess would be a corruption that can be
replayed by 3.2.0, but not by 3.16.0 (seems unlikely)

If I mount the filesystem in a rescue system with norecovery and the
initrd is either different or missing that would narrow it down, no?
And a workaround would be calling sync before the reboot.

Bernhard
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVRTYCAAoJEHdQeeW4ULyTj+YP/2HBbqdpJAckR1+l/W5UDjaN
c2hzPP9x5gEdrGStzigi6Z3KdM7m+EZZAmd8HRR0ZbBzjG5rvVris6HDe9q7ytIf
1ThPpd0Z67m1oWz+JSZ7V6Gh9sypJe+0EaStVoxd4ZN2tUdEFB4TN5DPubMAsslu
6fPIf/OSjc6ZL4SQbmGRmGjqDOJah8vdOu+YN/+X7FvBel/6Z54wqjqrtnXjIaEU
/1m0fas7/W1y278osGy9HNHsz/e/BVcW3dfFRm1XEJKGp7dglRTyPkC9+ITrW6Ci
qN3Bf5pevNl3vyfKuBlM8cqRhHsFrhyMxToMCFf8gUxwo+ZFXAhqIlEas+vT9R24
amKquDv79GdHta67WydqnUfW1EJe14eXinIgoB3tbplmRHD4l6vL7kqEro8SSjXS
Ggta+rDG3W/M3L20T9guDLKNa0x3e4RvQIKVHWNURiZCOz54eoOu1X+j/y+nZuYF
Ka5zPyN0D0f9MPPMX2K3PFBO8dNw42gWXR4ht2KCXxYz3edXSp4trWuP7BnnvmMy
YhEfOBKBF9IZ1DxOpbz97gXThU0RJDxMBkt6PR9IdDUqUUkC7mUuOgJWE2kOwtK4
6OMXxmhxe3BLeWIogzhpat2hJ7nT22bwRncZCgGIEwC4r5DZ+uCwYiOtTsqRg+iu
zuLjc4l8jDv0XAeHyUKu
=4CEw
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/55453602.2050...@birkenwald.de



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-30 Thread Bernhard Schmidt
Hi maximilian,

 [ copied from debian-user again ]

 ---
 Got another system with the symptoms and managed to get a snapshot.

 It is really extremely weird. The kernel output is

 List of all partitions:
 No filesystem could mount root, tried:
 Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(0,0)

 This is reproducible. To fix it it is enough to boot into the Wheezy
 kernel (even with init=/bin/sh), then reboot. It apparently does
 something to the root-fs (fsck?) which allows the Jessie kernel to boot.

 I have asked our Windows guys to make a screencast, it is uploaded here.

 http://users.birkenwald.de/~berni/volatile/783620.mkv

 We still have the snapshot available, if you have an idea please drop me
 a note.
 
 this means linux didn't get the initramfs passed by the bootloader.
 
 In the old days this happened when lilo was not run, these days it could
 be some grub modules out of sync (very wild guess).
 did you try before botting into that image to run install-grub in it?

I don't have access to the snapshot until Monday, but I don't think it
will help. As you can see in the video a simple fsck/mount in initrd in
the old kernel is enough, and grub isn't touched there.

But I'll test on Monday to be sure.

Bernhard


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/5541dc9c.7070...@birkenwald.de



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-29 Thread Bernhard Schmidt
Hi,

[ copied from debian-user again ]

---
Got another system with the symptoms and managed to get a snapshot.

It is really extremely weird. The kernel output is

List of all partitions:
No filesystem could mount root, tried:
Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(0,0)

This is reproducible. To fix it it is enough to boot into the Wheezy
kernel (even with init=/bin/sh), then reboot. It apparently does
something to the root-fs (fsck?) which allows the Jessie kernel to boot.

I have asked our Windows guys to make a screencast, it is uploaded here.

http://users.birkenwald.de/~berni/volatile/783620.mkv

We still have the snapshot available, if you have an idea please drop me
a note.
---

Bernhard



signature.asc
Description: OpenPGP digital signature


Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-29 Thread Bernhard Schmidt
Package: initramfs-tools
Followup-For: Bug #783620

Hi,

from the debian-user mailinglist ...


Bernhard Schmidt be...@birkenwald.de wrote:   

   
 Don Armstrong d...@debian.org wrote:
   
   
   
   
  
 Hi Don,   
   
  
   
   
  
 has anyone observed something similar to
 

 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783620 on their   
 

 Upgrade from Wheezy to Jessie? I'm still trying to figure out what's
 

 happening, and I don't really know where to look.   
 

 
 

 I was unable to attach the screenshot so far (mail is accepted but  
 

 never makes it to the BTS), I've put the screenshot here:   
 

 
 

 http://users.birkenwald.de/~berni/volatile/783620.png   
 

  
  
   
 Could you run something like this on the initrds?
  
   
  
  
   
 diff -u ( zcat workinginitrd) ( zcat brokeninitrd);
  
   
  
  
   
 It's possible that something has corrupted the initrds in some subtle
  
   
 way, or some part of the cpio archive has been truncated which causes as 
  
   
 issue for the kernel but is ignored by cpio. 
  
   
   
 

Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-29 Thread maximilian attems
On Wed, Apr 29, 2015 at 01:06:07PM +0200, Bernhard Schmidt wrote:
 Hi,
 
 [ copied from debian-user again ]
 
 ---
 Got another system with the symptoms and managed to get a snapshot.
 
 It is really extremely weird. The kernel output is
 
 List of all partitions:
 No filesystem could mount root, tried:
 Kernel panic - not syncing: VFS: Unable to mount root fs on
 unknown-block(0,0)
 
 This is reproducible. To fix it it is enough to boot into the Wheezy
 kernel (even with init=/bin/sh), then reboot. It apparently does
 something to the root-fs (fsck?) which allows the Jessie kernel to boot.
 
 I have asked our Windows guys to make a screencast, it is uploaded here.
 
 http://users.birkenwald.de/~berni/volatile/783620.mkv
 
 We still have the snapshot available, if you have an idea please drop me
 a note.

this means linux didn't get the initramfs passed by the bootloader.

In the old days this happened when lilo was not run, these days it could
be some grub modules out of sync (very wild guess).
did you try before botting into that image to run install-grub in it?


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/20150429144533.gd10...@stro.at



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-28 Thread Ben Hutchings
On Tue, 2015-04-28 at 21:39 +0200, Bernhard Schmidt wrote:
 Hi,
 
 I have tried two times to send the screenshot to this bug, but it was
 always eaten (delivered to @bugs.debian.org, but never made it to the
 BTS). I have put it online at
 http://users.birkenwald.de/~berni/volatile/783620.png
 
 Note that there is a bit of local integration work in these systems (a
 few additional packages, and the upgrade procedure switches from the
 legacy VMware tools to open-vm-tools), but nothing that deep that should
 affect initramfs. Also 90% of the upgrades go through without any issues.
 
 And the initrd content is binary-identical, so ...

This is a kernel panic, which usually means the initramfs wasn't loaded
at all.

Which boot loader is used on this system?  GRUB or something else?

Ben.

-- 
Ben Hutchings
Beware of programmers who carry screwdrivers. - Leonard Brandwein


signature.asc
Description: This is a digitally signed message part


Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-28 Thread Bernhard Schmidt
Package: initramfs-tools
Version: 0.120
Severity: important

Dear Maintainer,

I have a hard time wrapping my head around this bug, feel free to assign 
somewhere
else.

We have started upgrading some of our production VMs to Jessie. The testsystems 
worked
fine, but I have hit the following bug for the second time on a production VM 
now.

- dist-upgrade works flawlessly
- on first boot into Jessie I get an immediate (1s) kernel-panic (see attached
  screenshot) about being unable to find the root fs. Unfortunately I'm unable 
to
  get the full boot log, since I don't have a serial console there and kernel
  messages scroll by too fast.
- To fix the issue I have to boot into the old Wheezy kernel (3.2.0-4-amd64) in
  grub and regenerate the initrd for the Jessie kernel

# update-initramfs -k 3.16.0-4-amd64 -u

  Then it works fine.

Now comes the interesting part ... I have saved the broken initrd for later 
analysis

The compressed size is marginally different (broken being 3k smaller)

-rw-r--r--  1 root root 14339199 Apr 28 13:59 initrd.img-3.16.0-4-amd64
-rw-r--r--  1 root root 14338898 Apr 28 13:58 initrd.img-3.16.0-4-amd64.broken

The uncompressed size is the same

root@lxmhs63:/tmp# zcat /boot/initrd.img-3.16.0-4-amd64.broken  
initrd.img-3.16.0-4-amd64.broken
root@lxmhs63:/tmp# zcat /boot/initrd.img-3.16.0-4-amd64.broken  
/tmp/initrd.img-3.16.0-4-amd64.broken
root@lxmhs63:/tmp# ls -la /tmp/initrd.img-3.16.0-4-amd64*
-rw-r--r-- 1 root root 45304832 Apr 28 14:44 /tmp/initrd.img-3.16.0-4-amd64
-rw-r--r-- 1 root root 45304832 Apr 28 14:44 
/tmp/initrd.img-3.16.0-4-amd64.broken

The checksum is different

root@lxmhs63:/tmp# md5sum /tmp/initrd.img-3.16.0-4-amd64*
7b24aa901b697dc5dfdbad03bd199072  /tmp/initrd.img-3.16.0-4-amd64
5e467c0a49afa4ddae315cc6e818d7ac  /tmp/initrd.img-3.16.0-4-amd64.broken

Now comes the puzzling part ... the _content_ of the initrd is exactly the same

root@lxmhs63:/tmp# mkdir broken  cd broken  cpio -id  
../initrd.img-3.16.0-4-amd64.broken  
88486 blocks
root@lxmhs63:/tmp/broken# cd ..
root@lxmhs63:/tmp# mkdir ok  cd ok  cpio -id  ../initrd.img-3.16.0-4-amd64 

88486 blocks
root@lxmhs63:/tmp/ok# cd ..
root@lxmhs63:/tmp# diff -urN broken ok

I will try to capture a screenlog on the next upgrades, maybe there is 
something 
interesting in there. 

Bernhard


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
https://lists.debian.org/20150428125306.20837.15594.report...@badwlrz-clbsc01.ws.lrz.de



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-28 Thread Bernhard Schmidt


 The uncompressed size is the same
 
 root@lxmhs63:/tmp# zcat /boot/initrd.img-3.16.0-4-amd64.broken  
 initrd.img-3.16.0-4-amd64.broken
 root@lxmhs63:/tmp# zcat /boot/initrd.img-3.16.0-4-amd64.broken  
 /tmp/initrd.img-3.16.0-4-amd64.broken
 root@lxmhs63:/tmp# ls -la /tmp/initrd.img-3.16.0-4-amd64*
 -rw-r--r-- 1 root root 45304832 Apr 28 14:44 /tmp/initrd.img-3.16.0-4-amd64
 -rw-r--r-- 1 root root 45304832 Apr 28 14:44 
 /tmp/initrd.img-3.16.0-4-amd64.broken

Err wrong paste

zcat /boot/initrd.img-3.16.0-4-amd64.broken 
/tmp/initrd.img-3.16.0-4-amd64.broken
zcat /boot/initrd.img-3.16.0-4-amd64  /tmp/initrd.img-3.16.0-4-amd64

Bernhard


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/553f850a.10...@birkenwald.de



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-28 Thread Bernhard Schmidt
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Ben,

On 28.04.2015 22:01, Ben Hutchings wrote:
 On Tue, 2015-04-28 at 21:39 +0200, Bernhard Schmidt wrote:
 Hi,
 
 I have tried two times to send the screenshot to this bug, but it
 was always eaten (delivered to @bugs.debian.org, but never made
 it to the BTS). I have put it online at 
 http://users.birkenwald.de/~berni/volatile/783620.png
 
 Note that there is a bit of local integration work in these
 systems (a few additional packages, and the upgrade procedure
 switches from the legacy VMware tools to open-vm-tools), but
 nothing that deep that should affect initramfs. Also 90% of the
 upgrades go through without any issues.
 
 And the initrd content is binary-identical, so ...
 
 This is a kernel panic, which usually means the initramfs wasn't
 loaded at all.
 
 Which boot loader is used on this system?  GRUB or something else?

Thanks for the feedback.

Standard grub2 installation, nothing special about it. Grub did not
print any errors about not finding a file and I only ran
update-initramfs and rebooted.

I hope the next time it happens it will be with a less critical
machine and I can keep it down a bit to debug further.

Best Regards,
Bernhard
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQIcBAEBCAAGBQJVP+3cAAoJEHdQeeW4ULyT1g8QAJO1isSoXTkMDDz71UDVqX6A
KBSF0itWhgN2EIis4at2GtydGT8agtknRpRi7lOeML0ROPWcPhZkkE5LmoSCx+pV
wh4tMvNp8xzR7qINJ2ncCtmeSc/sy44FU4vBxYs/jbZA3xt3QH4YPow2gzMzIU+Z
47ByFCygI+rcu6ZEYVPViD6xTA7LoZ2MulsBr7QPIA/l7iX8uqKH3Qpgq4iRuaD3
Ww+zVN7nOrLCrpfQi0plRrO3wI62HieVeRkvZ10yCS7gFavjXxldu8V5ZVvfU33S
Y17IM0zbdl3FSi7lQ2pIwrSC6Yvz9EE1x0qygVk8HYeEEWsgcuu4Xp3TEJ1Y412M
ovsX+xREh7YzJP9HUZgX1DToI7Gp+91pBbVP3yEGt71oY16ezysRVlbkzKV2nTKo
AZv0euhS+SJHDEPCjEbJj/VvQD/1QrTSKMkuu5Dy+tqcNDIV2DSTfbtuLlGvmtqU
/0VIc6mSIYAof80vUKEkgt7MLvy8BamwtSBbB7cGyneJTq2o4uwRcifunjcHbbKn
7KcGtcGNZ0tUHhaK5dPFjLEwyLE75ei1mivE+kDigZgqlCT3SQpsimp7/MHtYYvo
yk265JIOW+r6uHwEWFjCNSYJrNX/jPbTpurxb19PnnkY/qhfs4WyD3RRbXt5KmJD
n5qaAfislxUubVfEs0tW
=lBoI
-END PGP SIGNATURE-


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/553feddc.4050...@birkenwald.de



Bug#783620: initramfs-tools: initramfs broken on first boot into Jessie, Unable to mount root fs on unknown-block(0, 0)

2015-04-28 Thread Bernhard Schmidt
Hi,

I have tried two times to send the screenshot to this bug, but it was
always eaten (delivered to @bugs.debian.org, but never made it to the
BTS). I have put it online at
http://users.birkenwald.de/~berni/volatile/783620.png

Note that there is a bit of local integration work in these systems (a
few additional packages, and the upgrade procedure switches from the
legacy VMware tools to open-vm-tools), but nothing that deep that should
affect initramfs. Also 90% of the upgrades go through without any issues.

And the initrd content is binary-identical, so ...

Best Regards,
Bernhard