[Bug 1915413] Re: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04

2022-01-27 Thread Alec Duroy
The issue has been fixed after the firmware was upgraded and SR-IOV
enabled at the BIOS.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1915413

Title:
  Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/1915413/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1915413] Re: Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04

2021-02-11 Thread Alec Duroy
Additional information:

With Rome based Delta A100 system, in order for nvidia drivers and
fabric manager to be installed successfully, the SR-IOV features at the
BIOS must be enabled. Otherwise if disabled, it will behave similar as
with Milan based Delta A100 system. This is evident on both 18.04 and
20.04 install.

However with RHEL 8 install, fabric manager service works fine either
SR-IOV enabled or disabled at the BIOS and nvidia-smi will displays all
8-GPUs as expected.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1915413

Title:
  Milan Delta A100 GPU fails to detect on Ubuntu 18.04 and 20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/1915413/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898808] Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut after images update on Oct 5

2020-10-20 Thread Alec Duroy
Hi Jeffrey,

Here's the output of the commands you requested:

1. certuser@maas216-cert:~$ ls -l 
/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily

total 460480
-rw-r--r-- 3 maas maas  86260608 Oct  6 00:25 boot-initrd
-rw-r--r-- 3 maas maas  11678464 Sep 29 01:57 boot-kernel
-rw-r--r-- 4 maas maas 373587968 Oct  6 00:25 squashfs


2. certuser@maas216-cert:~$  sha256sum 
/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/*

273212b1858bc4441b392900123f1433733023986d542e8d2caf458fbb48edb2  
/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-initrd
1a8aec22331f411cdbc61c367b7397cf0d6cda7a85afc94c7ebb64ec478c32b8  
/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/boot-kernel
0caa3059361ab22a75f9797834ff7bcce372621919122a7d8289b66a1a9c8084  
/var/lib/maas/boot-resources/current/ubuntu/amd64/ga-20.04/focal/daily/squashfs


Thanks
Alec

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898808

Title:
  [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut
  after images update on Oct 5

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1898808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1898808] Re: [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut

2020-10-14 Thread Alec Duroy
This is to confirm the issue as being reported here. I have the same
issue as with other two systems that I’m trying to certify with
20.04LTS. Both systems could not be deployed. Below is the snapshot.

Thanks
Alec


** Attachment added: 
"Boot-Issue-when-deployimg-20.04LTS-for-hardware-server-certification"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898808/+attachment/5422312/+files/Boot-up-issue-with-Maas-deploying-20.04LTS.jpg

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1898808

Title:
  [maas][focal]unable to deploy 20.04(focal) w/ default kernel for sut

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1898808/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15

2019-01-05 Thread Alec Duroy
** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1810548

Title:
  Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel
  4.15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15

2019-01-04 Thread Alec Duroy
** Attachment added: "syslog.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+attachment/5226959/+files/syslog.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1810548

Title:
  Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel
  4.15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1810548] [NEW] Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15

2019-01-04 Thread Alec Duroy
Public bug reported:

Bug Description:
This problem seems related to the following reported bugs:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748

System Configuration:
The server is populated with hot swappable 1.9TB 7mm U.2 NVMe and 1.9TB 2.5” 
SSD SATA – either drives can serve as system boot drive. In addition, there’re 
twelve (12) 7TB 3.5” HDD SAS drives for data raid storage. Currently, the SSD 
SATA drive is used as file system boot device. Below is the lsb_release and cat 
/proc/cmdline output.

Distributor ID: Ubuntu
Description:Ubuntu 18.04.1 LTS
Release:18.04
Codename:   bionic

BOOT_IMAGE=/boot/vmlinuz-4.15.0-43-generic
root=UUID=20881db9-36c3-4d83-9688-da36477e62e3 ro


Problem:
For both Xenial and Bionic running with kernel 4.15, the NVMe drive is having 
severe problems – the drive is intermittently disappearing with lsblk and nvme 
list commands, while spewing a lot of Buffer I/O errors on screen. The lspci 
output shows that the controller is disconnecting intermittently on the PCIe 
bus. 

Symptom:
At initial system’s boot up from fresh install, the NVMe drive will be 
recognized by lsblk, lspci and nvme list commands. Then the drive will go 
missing for no obvious reason then will appear again randomly, and this restart 
events will cycle continuously even the drive is at idle state. The output of 
dmesg, syslog and journal are all the same – it displays the unexpected 
shutdown of the drive. See below output. (for detailed text, see the attached 
dmesg, syslog and journalctl files)
-
[  485.296413] nvme nvme1: pci function :18:00.0
[  485.296516] nvme :18:00.0: enabling device (0100 -> 0102)
[  485.406992] nvme nvme1: Shutdown timeout set to 8 seconds
[  485.409433] nvme nvme1: failed to mark controller state 1
[  485.409435] nvme nvme1: Removing after probe failure status: 0
-
This symptom is present on both distro 16.04LTS and 18.04LTS running at kernel 
4.15. The same symptom was evident at upstream kernels - 4.18.3 and 4.18.19 as 
well.  

Work-around:
With kernel 4.19, this symptom does not manifest. It has fixed the random 
disconnection of NVMe controller over the PCIe bus. In addition, all the 
commands - i.e. lspci, lsblk, and nvme list will consistently display the drive 
without the occurrence of a missing NVMe drive during scan. 

How to reproduce:
1.Use Samsung U.2 NVME SM961/PM961 drive
2.Install either Ubuntu 16.04.5 or 18.04.1 running at kernel 4.15
3.Boot and wait until the file system recognized all the installed drives in 
the system. 
4.Execute lsblk, lspci and nvme list command repeatedly. You will see, the NVMe 
drive will randomly go missing and will spew-out a lot of buffer IO errors. 
With 16.04.5LTS, the symptom is more severe. After fresh install, the drive 
will appear once, but once it goes missing, it will never come back again.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: Incomplete


** Tags: bionic

** Attachment added: "dmesg.txt"
   https://bugs.launchpad.net/bugs/1810548/+attachment/5226958/+files/dmesg.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1810548

Title:
  Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel
  4.15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1810548] Re: Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel 4.15

2019-01-04 Thread Alec Duroy
** Attachment added: "journalctl-xb.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+attachment/5226960/+files/journalctl-xb.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1810548

Title:
  Samsung U.2 NVMe SM961/PM961 randomly will go missing under kernel
  4.15

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810548/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS

2018-10-22 Thread Alec Duroy
'kernel-fixed-upstream'

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1798127

Title:
  CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as
  root FS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS

2018-10-22 Thread Alec Duroy
Successfully upgraded to 4.19. Stress-ng memory test passed without lockup 
issue. New rebuild kernel 4.19  has fixed the issue. Refer to text below:
---
ubuntu@fluent-orca:~$ uname -r
4.19.0-041900rc8-generic

ubuntu@fluent-orca:~$ sudo stress-ng -k --aggressive --verify --timeout 300 
--stack 0
stress-ng: info:  [3516] dispatching hogs: 112 stack
stress-ng: info:  [3516] successful run completed in 311.37s (5 mins, 11.37 
secs) ubuntu@fluent-orca:~$


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1798127

Title:
  CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as
  root FS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1798127] Re: CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as root FS

2018-10-17 Thread Alec Duroy
** Changed in: linux (Ubuntu)
   Status: Triaged => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1798127

Title:
  CPU Soft Lockups when stress-ng stack stressor runs with M.2 NVMe as
  root FS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1798127/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1437353] Re: UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit SFI/SFP+

2018-08-17 Thread Alec Duroy
Hi,
I was also affected by this issue - to PXE boot at UEFI for add-on adapter i350 
1Gb interface. Upgrading the MAAS server version to 18.04 bionic release did 
not solve the symptom. But I was able to find a workaround by booting first the 
IPv6 before IPv4 - the same workaround mentioned by Rod Smith about his comment 
#21 on reported Bug #1437024. The detail of this was filed under Bug #1787637. 
Thanks

Additional info:
i350 FW released for AOC is v1.63
MAAS server grub version = 2.02-2ubuntu8.2 
Node deployed successfully due to workaround =  2.02-beta2-36ubuntu3.18

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1437353

Title:
  UEFI network boot hangs at grub for adapter 82599ES 10-Gigabit
  SFI/SFP+

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1437353/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1437024] Re: Failure to PXE-boot from secondary NIC

2018-08-17 Thread Alec Duroy
Hi,
I was also affected by this issue - to PXE boot at UEFI for add-on adapter i350 
1Gb interface. Upgrading the MAAS server version to 18.04 bionic release did 
not solve the symptom. But I was able to find a workaround by booting first the 
IPv6 before IPv4 - the same workaround mentioned by Rod Smith about his comment 
#21 on reported Bug #1437024. The detail of this was filed under Bug #1787637. 
Thanks

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1437024

Title:
  Failure to PXE-boot from secondary NIC

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1437024/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1767018] [NEW] NO test plan for 18.04 server certification

2018-04-25 Thread Alec Duroy
Public bug reported:

The server certification for 18.04 beta release does not appear in the
select test plan. See attachment

** Affects: grub2 (Ubuntu)
 Importance: Undecided
 Status: New

** Attachment added: "Screen snapshot"
   
https://bugs.launchpad.net/bugs/1767018/+attachment/5127379/+files/Screenshot%20from%202018-04-19%2018-48-30.png

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1767018

Title:
  NO test plan for 18.04 server certification

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1767018/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs