[Bug 1915413] Re: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04
The issue has been fixed after the firmware was upgraded and SR-IOV enabled at the BIOS. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915413 Title: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1915413/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1915413] Re: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04
Additional information: With Rome based Delta A100 system, in order for nvidia drivers and fabric manager to be installed successfully, the SR-IOV features at the BIOS must be enabled. Otherwise if disabled, it will behave similar as with Milan based Delta A100 system. This is evident on both 18.04 and 20.04 install. However with RHEL 8 install, fabric manager service works fine either SR-IOV enabled or disabled at the BIOS and nvidia-smi will displays all 8-GPUs as expected. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1915413 Title: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1915413/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1898808] Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut after images update on Oct 5
Hi Jeffrey, Here's the output of the commands you requested: 1. certuser@maas216-cert:~$ ls -l /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily total 460480 -rw-r--r-- 3 maas maas 86260608 Oct 6 00:25 boot-initrd -rw-r--r-- 3 maas maas 11678464 Sep 29 01:57 boot-kernel -rw-r--r-- 4 maas maas 373587968 Oct 6 00:25 squashfs 2. certuser@maas216-cert:~$ sha256sum /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/* 273212b1858bc4441b392900123f1433733023986d542e8d2caf458fbb48edb2 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-initrd 1a8aec22331f411cdbc61c367b7397cf0d6cda7a85afc94c7ebb64ec478c32b8 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-kernel 0caa3059361ab22a75f9797834ff7bcce372621919122a7d8289b66a1a9c8084 /var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/squashfs Thanks Alec -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1898808 Title: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut after images update on Oct 5 To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1898808/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1898808] Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut
This is to confirm the issue as being reported here. I have the same issue as with other two systems that I’m trying to certify with 20.04LTS. Both systems could not be deployed. Below is the snapshot. Thanks Alec ** Attachment added: "Boot-Issue-when-deployimg-20.04LTS-for-hardware-server-certification" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898808/+attachment/5422312/+files/Boot-up-issue-with-Maas-deploying-20.04LTS.jpg -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1898808 Title: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898808/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15
** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1810548 Title: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15
** Attachment added: "syslog.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+attachment/5226959/+files/syslog.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1810548 Title: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1810548] [NEW] Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15
Public bug reported: Bug Description: This problem seems related to the following reported bugs: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748 System Configuration: The server is populated with hot swappable 1.9TB 7mm U.2 NVMe and 1.9TB 2.5” SSD SATA – either drives can serve as system boot drive. In addition, there’re twelve (12) 7TB 3.5” HDD SAS drives for data raid storage. Currently, the SSD SATA drive is used as file system boot device. Below is the lsb_release and cat /proc/cmdline output. Distributor ID: Ubuntu Description:Ubuntu 18.04.1 LTS Release:18.04 Codename: bionic BOOT_IMAGE=/boot/vmlinuz-4.15.0-43-generic root=UUID=20881db9-36c3-4d83-9688-da36477e62e3 ro Problem: For both Xenial and Bionic running with kernel 4.15, the NVMe drive is having severe problems – the drive is intermittently disappearing with lsblk and nvme list commands, while spewing a lot of Buffer I/O errors on screen. The lspci output shows that the controller is disconnecting intermittently on the PCIe bus. Symptom: At initial system’s boot up from fresh install, the NVMe drive will be recognized by lsblk, lspci and nvme list commands. Then the drive will go missing for no obvious reason then will appear again randomly, and this restart events will cycle continuously even the drive is at idle state. The output of dmesg, syslog and journal are all the same – it displays the unexpected shutdown of the drive. See below output. (for detailed text, see the attached dmesg, syslog and journalctl files) - [ 485.296413] nvme nvme1: pci function :18:00.0 [ 485.296516] nvme :18:00.0: enabling device (0100 -> 0102) [ 485.406992] nvme nvme1: Shutdown timeout set to 8 seconds [ 485.409433] nvme nvme1: failed to mark controller state 1 [ 485.409435] nvme nvme1: Removing after probe failure status: 0 - This symptom is present on both distro 16.04LTS and 18.04LTS running at kernel 4.15. The same symptom was evident at upstream kernels - 4.18.3 and 4.18.19 as well. Work-around: With kernel 4.19, this symptom does not manifest. It has fixed the random disconnection of NVMe controller over the PCIe bus. In addition, all the commands - i.e. lspci, lsblk, and nvme list will consistently display the drive without the occurrence of a missing NVMe drive during scan. How to reproduce: 1.Use Samsung U.2 NVME SM961/PM961 drive 2.Install either Ubuntu 16.04.5 or 18.04.1 running at kernel 4.15 3.Boot and wait until the file system recognized all the installed drives in the system. 4.Execute lsblk, lspci and nvme list command repeatedly. You will see, the NVMe drive will randomly go missing and will spew-out a lot of buffer IO errors. With 16.04.5LTS, the symptom is more severe. After fresh install, the drive will appear once, but once it goes missing, it will never come back again. ** Affects: linux (Ubuntu) Importance: Undecided Status: Incomplete ** Tags: bionic ** Attachment added: "dmesg.txt" https://bugs.launchpad.net/bugs/1810548/+attachment/5226958/+files/dmesg.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1810548 Title: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15
** Attachment added: "journalctl-xb.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+attachment/5226960/+files/journalctl-xb.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1810548 Title: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS
'kernel-fixed-upstream' -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798127 Title: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS
Successfully upgraded to 4.19. Stress-ng memory test passed without lockup issue. New rebuild kernel 4.19 has fixed the issue. Refer to text below: --- ubuntu@fluent-orca:~$ uname -r 4.19.0-041900rc8-generic ubuntu@fluent-orca:~$ sudo stress-ng -k --aggressive --verify --timeout 300 --stack 0 stress-ng: info: [3516] dispatching hogs: 112 stack stress-ng: info: [3516] successful run completed in 311.37s (5 mins, 11.37 secs) ubuntu@fluent-orca:~$ -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798127 Title: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS
** Changed in: linux (Ubuntu) Status: Triaged => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1798127 Title: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1437353] Re: UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit SFI/SFP+
Hi, I was also affected by this issue - to PXE boot at UEFI for add-on adapter i350 1Gb interface. Upgrading the MAAS server version to 18.04 bionic release did not solve the symptom. But I was able to find a workaround by booting first the IPv6 before IPv4 - the same workaround mentioned by Rod Smith about his comment #21 on reported Bug #1437024. The detail of this was filed under Bug #1787637. Thanks Additional info: i350 FW released for AOC is v1.63 MAAS server grub version = 2.02-2ubuntu8.2 Node deployed successfully due to workaround = 2.02-beta2-36ubuntu3.18 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1437353 Title: UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit SFI/SFP+ To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1437353/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1437024] Re: Failure to PXE-boot from secondary NIC
Hi, I was also affected by this issue - to PXE boot at UEFI for add-on adapter i350 1Gb interface. Upgrading the MAAS server version to 18.04 bionic release did not solve the symptom. But I was able to find a workaround by booting first the IPv6 before IPv4 - the same workaround mentioned by Rod Smith about his comment #21 on reported Bug #1437024. The detail of this was filed under Bug #1787637. Thanks -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1437024 Title: Failure to PXE-boot from secondary NIC To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1437024/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1767018] [NEW] NO test plan for 18.04 server certification
Public bug reported: The server certification for 18.04 beta release does not appear in the select test plan. See attachment ** Affects: grub2 (Ubuntu) Importance: Undecided Status: New ** Attachment added: "Screen snapshot" https://bugs.launchpad.net/bugs/1767018/+attachment/5127379/+files/Screenshot%20from%202018-04-19%2018-48-30.png -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1767018 Title: NO test plan for 18.04 server certification To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1767018/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs