[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2020-02-25 Thread Eddie Campbell
Christian, was this ever resolved?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 
4.10.0-14-generic (buildd@lcy01-01) (gcc version 6.3.0 20170221 (Ubuntu 
6.3.0-8ubuntu1) ) 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-09-15 Thread Narinder Gupta
I have subscribed to HPE Eddie and Ganesh to this bug whether they have
tested this with old kernel or some parameter needs to be added.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-05-02 Thread Joseph Salisbury
** Tags removed: kernel-key
** Tags added: kernel-da-key

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 
4.10.0-14-generic (buildd@lcy01-01) (gcc version 6.3.0 20170221 (Ubuntu 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-18 Thread ChristianEhrhardt
Subscribing Narinder to map that to HPE if possible.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 
4.10.0-14-generic (buildd@lcy01-01) (gcc version 6.3.0 20170221 (Ubuntu 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-06 Thread ChristianEhrhardt
Hi Joseph,
no regression IMHO.
Only the frequency or signature of the issue changed by Kernel upgrades.
I did not yet go back further than Xenial but that is worth a try as soon as I 
find time for it again.

I'd almost think it is a FW issue still, but then there is no better FW.
Do we have a way to mirror issues to HP being the HW manufacturer?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-06 Thread Joseph Salisbury
Is this bug a regression?  Did this issue start happening after an
update/upgrade?  Was there a prior kernel version where you were not
having this particular problem?

It might be worth testing the Trusty or Precise kernel.

If it is a regression, we can perform a kernel bisect to identify the
commit that introduced this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [ 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-06 Thread ChristianEhrhardt
Hard to install when the system is so broken :-/
I installed the latest Mainline which is 4.11.0-041100rc5.201704022131 and 
enabled the intel_iommu on it.

- Reboot: ok
- Try to trigger with I/O
  - Fio: still ok
  - apt: working
  - random working on the system: crash

That said verified to fail on 4.11.0-041100rc5.201704022131 as well.
The error message on that kernel is similar:

[ 5624.375286] DMAR: DRHD: handling fault status reg 2
[ 5624.397959] DMAR: [DMA Read] Request device [03:00.0] fault addr fbd85000 
[fault reason 06] PTE Read access is not set
[ 5686.804464] blk_update_request: I/O error, dev sda, sector 824203256

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-06 Thread Joseph Salisbury
Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.11 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".


Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11-rc5

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread Joseph Salisbury
** Tags removed: kernel-da-key
** Tags added: kernel-key

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 
4.10.0-14-generic (buildd@lcy01-01) (gcc version 6.3.0 20170221 (Ubuntu 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread ChristianEhrhardt
After reboot it only took like 10 minutes this time to hit me again :-/
There seems no reliable way to be sure anymore.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] Linux version 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread ChristianEhrhardt
** Attachment added: "failing again even after the FW upgrade to the latest 
version"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1679208/+attachment/4854345/+files/horsea-failing-again.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  New

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread ChristianEhrhardt
Seems I was happy about the FW being the fix too early, it turns out to still 
pop up.
Just not on boot.
About 6 hours working I ran into it again.

... attaching the latest dmesg messages

** Changed in: linux (Ubuntu)
   Status: Invalid => New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  New

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread ChristianEhrhardt
>From there I ran some I/O as on the other system we had the impression tp 
>trigger it by further I/O.
But the system is still fine - so the bug might be good documentation for the 
next one hitting it, but TL;DR is: "FW bug - FW update".

Per this conclusion I'm setting the kernel task to "invalid".

** Changed in: linux (Ubuntu)
   Status: Triaged => Invalid

** Changed in: linux (Ubuntu)
 Assignee: ChristianEhrhardt (paelzer) => (unassigned)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Invalid

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-04 Thread ChristianEhrhardt
Updated the System to latest FW for the storage controller 4.52 (and iLO to 
2.50).
With that updated to Zesty again all till working fine.

>From here I enabled intel_iommu=on and it booted which already is an
improvement.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Triaged

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
Unassigning myself as "broken HW" no more seems an option, so for the
kernel Team to re-asssign.

Please let me know what the next steps you need would be.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Triaged

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
This was brought to my attention:
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04805565

While it has no relation to why it would be triggered by iommu (it
should isolate, not link access together right?) it might be worth the
FW upgrade to verify if it fixes the issue.

I'll report back once I was able to do so.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Triaged

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
We just found that the second box we tested just needed some time (or
I/O) to run into the same.

[ 8710.266192] DMAR: DRHD: handling fault status reg 2
[ 8710.289318] DMAR: [DMA Read] Request device [03:00.0] fault addr f8bf5000 
[fault reason 06] PTE Read access is not set
[ 8865.745527] blk_update_request: I/O error, dev sda, sector 349218832
[ 8865.775217] Buffer I/O error on device bcache0, logical block 19238912
[ 8865.804664] Buffer I/O error on device bcache0, logical block 19238913
[ 8865.834530] Buffer I/O error on device bcache0, logical block 19238914
[ 8865.864004] Buffer I/O error on device bcache0, logical block 19238915
[ 8865.893787] Buffer I/O error on device bcache0, logical block 19238916
[ 8865.923772] Buffer I/O error on device bcache0, logical block 19238917
[ 8865.953105] Buffer I/O error on device bcache0, logical block 19238918
[ 8865.982733] Buffer I/O error on device bcache0, logical block 19238919
[ 8866.012426] Buffer I/O error on device bcache0, logical block 19238920
[ 8866.041939] Buffer I/O error on device bcache0, logical block 19238921
[ 8866.071403] sd 0:0:1:0: rejecting I/O to offline device
[ 8866.095709] sd 0:0:1:0: rejecting I/O to offline device

Note: in those states the system is still alive but remounted r/o.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Triaged

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread Joseph Salisbury
** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
   Status: Incomplete => Triaged

** Tags added: kernel-da-key

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Triaged

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
In what appeared similar https://bugzilla.redhat.com/show_bug.cgi?id=649766
it was recommended to set iommu=pt, but in our case that does not help.

It changes the messaging but sitll fails on boot:

[   75.256554] DMAR: [DMA Read] Request device [03:00.0] fault addr fec0e000 
[fault reason 06] PTE Read access is not set
[  199.315689] blk_update_request: I/O error, dev sda, sector 1116802096
[  199.345283] EXT4-fs error (device sda2): ext4_find_entry:1463: inode 
#34865359: comm ureadahead: reading directory lblock 0
[  199.345284] blk_update_request: I/O error, dev sda, sector 399530240
[  199.345294] blk_update_request: I/O error, dev sda, sector 399532288
[  199.353290] sd 0:0:1:0: rejecting I/O to offline device
[  199.353314] sd 0:0:1:0: rejecting I/O to offline device

** Bug watch added: Red Hat Bugzilla #649766
   https://bugzilla.redhat.com/show_bug.cgi?id=649766

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
** Attachment added: "Fail on zesty right on boot"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1679208/+attachment/4853630/+files/fail-z.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  New

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 node-horsea kernel: [0.00] 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
** Attachment added: "fail on x later after boot (minutes-hours into working 
fine)"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1679208/+attachment/4853629/+files/fail-x.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  New

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3 12:47:45 

[Kernel-packages] [Bug 1679208] Re: Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with intel_iommu=on

2017-04-03 Thread ChristianEhrhardt
Please note that on the "good" system nobody ever used iommu device
assignment, I'll do so after the next days. That way we should also
learn if the can bring a good system into the failing mode.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1679208

Title:
  Zesty (4.10.0-14) won't boot on HP ProLiant DL360 Gen9 with
  intel_iommu=on

Status in linux package in Ubuntu:
  New

Bug description:
  TL;DR
  - one of our HP ProLiant DL360 Gen9 fails to boot with intel_iommu=on
  - the Disk controller fails
  - Xenial seems to work for a while but then fails
  - Zesty 100% crashes on boot
  - An identical system seems to work, so need HW replace to finally confirm

  After reboot one sees a HW report like this:
  After the boot I see the HW telling me this on boot:
  Embedded RAID : Smart HBA H240ar Controller - Operation Failed
   - 1719-Slot 0 Drive Array  - A controller failure event occurred prior
 to this power-up. (Previous lock up code = 0x13)

  
  I tried several things (In between always redeploy zesty with MAAS).
  I think my debugging might be helpful, but I wanted to keep the documentation 
in the bug in case you'd go another route or that others find useful 
information in here.

  0. I retried what I did twice, fully reproducible
 That is:
 0.1 install zesty 
 0.2 change grub default cmdline in /etc/default/grub.d/50- to add 
intel_iommu=on
 0.3 sudo update-grub
 0.4 reboot


  1. I tried a Recovery boot from the boot options in gub.
 => Failed as well


  2. iLO rebooted vis "request reboot" and as well via "full system reset"
 => both Failed


  3. Reboot the system as deployed by MAAS
 # /proc/cmdline before that
 BOOT_IMAGE=/boot/vmlinuz-4.10.0-14-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
 The orig grub.cfg is like http://paste.ubuntu.com/24305945/
 It reboots as-is.
 => Reboot worked


  4. without a change to anything in /etc run update-grub
 $ sudo update-grub
 Generating grub configuration file ...
 Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT 
is set is no longer supported.
 Found linux image: /boot/vmlinuz-4.10.0-14-generic
 Found initrd image: /boot/initrd.img-4.10.0-14-generic
 Adding boot menu entry for EFI firmware configuration
 done

 There was no diff between the new grub.cfg and the one I saved.
 => Reboot worked


  5. add the intel_iommu=on arg
$ sudo sed -i 
's/GRUB_CMDLINE_LINUX_DEFAULT=""/GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"/' 
/etc/default/grub.d/50-curtin-settings.cfg
$ sudo update-grub
# Diff in grub.cfg really only is the iommu setting
=> Reboot Failed
So this doesn't seem so much of a cloud-init/curtin/maas bug anymore to me 
- maybe intel_iommu bheaves different?
  - Check grub cfg pre/post - not change but the expected?


  6. Install Xenial and do the same
 => Reboot working


  7. Upgrade to Z
 Since the Xenial system just worked and one can assume that almost only 
kernel is working so early in the boot process I upgraded the working system 
with intel_iommu=on to Zesty.
 That would be 4.4.0-71-generic to 4.10.0-1
 On this upgrade I finally saw my I/O errors again :-/
 Note: these issues are hard to miss as they mount root as read-only.
 I wonder if they only ever appear with intel_iommu=on as this is the only 
combo I ever saw them,


  8. Redeploy and upgrade to Z without intel_iommu=on enabled
 Then enable intel_iommu=on and reboot
 => Reboot Fail
 From here I rebooted into the Xenial kerenl (that since this is an update 
was still there)
 Here I saw:
  Loading Linux 4.4.0-71-generic ...
  Loading initial ramdisk ...
  error: invalid video mode specification `text'.
  Booting in blind mode
 Hrm, as outlined above the "blind mode" might be a red herring, but since 
this kernel worked before it might still be a red herring that swims in the 
initrd that got regenerated on the upgrade.
 => Xenial Kernel Reboot - works !!
 So "blind mode" is a red herring of some sort.
 
 But this might allow to find some logs
 => No
 This appears as if the Failing boot has never made it to the point to 
actually write anything.
 I see:
  1. the original xenial
  2. the upgraded zesty
  3. NOT THE zesty+iommu
  4. the xenial+iommu

  $ egrep 'kernel:.*(Linux version|Command line)' /var/log/syslog 
  Apr  3 12:15:20 node-horsea kernel: [0.00] Linux version 
4.4.0-71-generic (buildd@lcy01-05) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.4) ) #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 (Ubuntu 
4.4.0-71.92-generic 4.4.49)
  Apr  3 12:15:20 node-horsea kernel: [0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.4.0-71-generic 
root=UUID=2137c19a-d441-43fa-82e2-f2b7e3b2727b ro
  Apr  3