Public bug reported:

HP DL580G7
MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1
cloud-init.log and syslog attached.
System normally deploys fine as part of a MAAS cluster. Issue is only occurring 
when trying to juju deploy a Kubernetes worker to this node. Hang occurs trying 
to load docker.io according to syslog once system hangs, can't ssh into system 
console shows the error:

netenp4s0f0 Firmware Hang Detected
.
.
.
netxen_nic enp... Device Initialization error


uname -a
Linux gpu-server 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux


ethtool -i enp4s0f0

driver: netxen_nic
version: 4.0.82
firmware-version: 4.0.596
expansion-rom-version: 
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


dmesg |grep netxen

[    3.284815] netxen_nic 0000:04:00.0: 2MB memory map
[    3.537593] netxen_nic 0000:04:00.0: Gen2 strapping detected
[    3.537682] netxen_nic 0000:04:00.0: using 64-bit dma mask
[    3.828914] netxen_nic: NX3031 Gigabit Ethernet Board S/N 
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff
  Chip rev 0x42
[    3.828918] netxen_nic 0000:04:00.0: Driver v4.0.82, firmware v4.0.596 
[legacy]
[    3.829102] netxen_nic 0000:04:00.0: using msi-x interrupts
[    3.829105] netxen_nic 0000:04:00.0: non ULA adapter
[    3.829372] netxen_nic 0000:04:00.0: eth0: GbE port initialized
[    3.850279] netxen_nic 0000:04:00.1: 2MB memory map
[    3.850489] netxen_nic 0000:04:00.1: using 64-bit dma mask
[    4.080327] netxen_nic 0000:04:00.1: Driver v4.0.82, firmware v4.0.596 
[legacy]
[    4.080588] netxen_nic 0000:04:00.1: using msi-x interrupts
[    4.080874] netxen_nic 0000:04:00.1: eth1: GbE port initialized
[    4.132499] netxen_nic 0000:04:00.2: 2MB memory map
[    4.132707] netxen_nic 0000:04:00.2: using 64-bit dma mask
[    4.188596] netxen_nic 0000:04:00.2: Driver v4.0.82, firmware v4.0.596 
[legacy]
[    4.192717] netxen_nic 0000:04:00.2: using msi-x interrupts
[    4.196837] netxen_nic 0000:04:00.2: eth2: GbE port initialized
[    4.317290] netxen_nic 0000:04:00.3: 2MB memory map
[    4.321319] netxen_nic 0000:04:00.3: using 64-bit dma mask
[    4.388142] netxen_nic 0000:04:00.3: Driver v4.0.82, firmware v4.0.596 
[legacy]
[    4.392254] netxen_nic 0000:04:00.3: using msi-x interrupts
[    4.396445] netxen_nic 0000:04:00.3: eth3: GbE port initialized
[    4.403938] netxen_nic 0000:04:00.2 enp4s0f2: renamed from eth2
[    4.524474] netxen_nic 0000:04:00.0 enp4s0f0: renamed from eth0
[    4.564541] netxen_nic 0000:04:00.1 enp4s0f1: renamed from eth1
[    4.600619] netxen_nic 0000:04:00.3 enp4s0f3: renamed from eth3
[   19.290490] netxen_nic: enp4s0f0 NIC Link is up
[   21.606695] netxen_nic: enp4s0f1 NIC Link is up

Last thing in syslog before nics crash:

Feb 17 19:08:57 gpu-server systemd[1]: Started ACPI event daemon.
Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Socket for the API.
Feb 17 19:08:57 gpu-server systemd[1]: Listening on Docker Socket for the API.
Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Application Container 
Engine...
Feb 17 19:08:58 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:58.032297862Z" level=info msg="libcontainerd: new 
containerd process, pid: 18913"
Feb 17 19:08:59 gpu-server kernel: [  327.615084] audit: type=1400 
audit(1518894539.111:43): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="docker-default" pid=18928 comm="apparmor_parser"
Feb 17 19:08:59 gpu-server kernel: [  327.662287] aufs 4.13-20170911
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.358149475Z" level=info msg="Graph migration to 
content-addressability took 0.00 seconds"
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.358893722Z" level=warning msg="Your kernel does not 
support swap memory limit"
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.359058327Z" level=warning msg="Your kernel does not 
support cgroup rt period"
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.359106384Z" level=warning msg="Your kernel does not 
support cgroup rt runtime"
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.360798923Z" level=info msg="Loading containers: 
start."
Feb 17 19:08:59 gpu-server kernel: [  327.889472] Bridge firewalling registered
Feb 17 19:08:59 gpu-server kernel: [  327.924904] nf_conntrack version 0.5.0 
(65536 buckets, 262144 max)
Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.456778799Z" level=info msg="Firewalld running: false"

** Affects: linux-firmware (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "log.tar"
   https://bugs.launchpad.net/bugs/1750176/+attachment/5057440/+files/log.tar

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-firmware in Ubuntu.
https://bugs.launchpad.net/bugs/1750176

Title:
  Firmware Hang detected on HP netxen NIC

Status in linux-firmware package in Ubuntu:
  New

Bug description:
  HP DL580G7
  MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1
  cloud-init.log and syslog attached.
  System normally deploys fine as part of a MAAS cluster. Issue is only 
occurring when trying to juju deploy a Kubernetes worker to this node. Hang 
occurs trying to load docker.io according to syslog once system hangs, can't 
ssh into system console shows the error:

  netenp4s0f0 Firmware Hang Detected
  .
  .
  .
  netxen_nic enp... Device Initialization error


  uname -a
  Linux gpu-server 4.13.0-32-generic #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  
  ethtool -i enp4s0f0

  driver: netxen_nic
  version: 4.0.82
  firmware-version: 4.0.596
  expansion-rom-version: 
  bus-info: 0000:04:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: no


  dmesg |grep netxen

  [    3.284815] netxen_nic 0000:04:00.0: 2MB memory map
  [    3.537593] netxen_nic 0000:04:00.0: Gen2 strapping detected
  [    3.537682] netxen_nic 0000:04:00.0: using 64-bit dma mask
  [    3.828914] netxen_nic: NX3031 Gigabit Ethernet Board S/N 
\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff\xffffffff
  Chip rev 0x42
  [    3.828918] netxen_nic 0000:04:00.0: Driver v4.0.82, firmware v4.0.596 
[legacy]
  [    3.829102] netxen_nic 0000:04:00.0: using msi-x interrupts
  [    3.829105] netxen_nic 0000:04:00.0: non ULA adapter
  [    3.829372] netxen_nic 0000:04:00.0: eth0: GbE port initialized
  [    3.850279] netxen_nic 0000:04:00.1: 2MB memory map
  [    3.850489] netxen_nic 0000:04:00.1: using 64-bit dma mask
  [    4.080327] netxen_nic 0000:04:00.1: Driver v4.0.82, firmware v4.0.596 
[legacy]
  [    4.080588] netxen_nic 0000:04:00.1: using msi-x interrupts
  [    4.080874] netxen_nic 0000:04:00.1: eth1: GbE port initialized
  [    4.132499] netxen_nic 0000:04:00.2: 2MB memory map
  [    4.132707] netxen_nic 0000:04:00.2: using 64-bit dma mask
  [    4.188596] netxen_nic 0000:04:00.2: Driver v4.0.82, firmware v4.0.596 
[legacy]
  [    4.192717] netxen_nic 0000:04:00.2: using msi-x interrupts
  [    4.196837] netxen_nic 0000:04:00.2: eth2: GbE port initialized
  [    4.317290] netxen_nic 0000:04:00.3: 2MB memory map
  [    4.321319] netxen_nic 0000:04:00.3: using 64-bit dma mask
  [    4.388142] netxen_nic 0000:04:00.3: Driver v4.0.82, firmware v4.0.596 
[legacy]
  [    4.392254] netxen_nic 0000:04:00.3: using msi-x interrupts
  [    4.396445] netxen_nic 0000:04:00.3: eth3: GbE port initialized
  [    4.403938] netxen_nic 0000:04:00.2 enp4s0f2: renamed from eth2
  [    4.524474] netxen_nic 0000:04:00.0 enp4s0f0: renamed from eth0
  [    4.564541] netxen_nic 0000:04:00.1 enp4s0f1: renamed from eth1
  [    4.600619] netxen_nic 0000:04:00.3 enp4s0f3: renamed from eth3
  [   19.290490] netxen_nic: enp4s0f0 NIC Link is up
  [   21.606695] netxen_nic: enp4s0f1 NIC Link is up

  Last thing in syslog before nics crash:

  Feb 17 19:08:57 gpu-server systemd[1]: Started ACPI event daemon.
  Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Socket for the API.
  Feb 17 19:08:57 gpu-server systemd[1]: Listening on Docker Socket for the API.
  Feb 17 19:08:57 gpu-server systemd[1]: Starting Docker Application Container 
Engine...
  Feb 17 19:08:58 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:58.032297862Z" level=info msg="libcontainerd: new 
containerd process, pid: 18913"
  Feb 17 19:08:59 gpu-server kernel: [  327.615084] audit: type=1400 
audit(1518894539.111:43): apparmor="STATUS" operation="profile_load" 
profile="unconfined" name="docker-default" pid=18928 comm="apparmor_parser"
  Feb 17 19:08:59 gpu-server kernel: [  327.662287] aufs 4.13-20170911
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.358149475Z" level=info msg="Graph migration to 
content-addressability took 0.00 seconds"
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.358893722Z" level=warning msg="Your kernel does not 
support swap memory limit"
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.359058327Z" level=warning msg="Your kernel does not 
support cgroup rt period"
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.359106384Z" level=warning msg="Your kernel does not 
support cgroup rt runtime"
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.360798923Z" level=info msg="Loading containers: 
start."
  Feb 17 19:08:59 gpu-server kernel: [  327.889472] Bridge firewalling 
registered
  Feb 17 19:08:59 gpu-server kernel: [  327.924904] nf_conntrack version 0.5.0 
(65536 buckets, 262144 max)
  Feb 17 19:08:59 gpu-server dockerd[18889]: 
time="2018-02-17T19:08:59.456778799Z" level=info msg="Firewalld running: false"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/1750176/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to