[Kernel-packages] [Bug 1944005] Re: System Lockup during failover

2021-12-13 Thread Steven Parker
*** This bug is a duplicate of bug 1944586 ***
https://bugs.launchpad.net/bugs/1944586

Dulicate of bug which is now fixed in stable

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1944586/comments/8


** This bug has been marked a duplicate of bug 1944586
   kernel bug found when disconnecting one fiber channel interface on Cisco 
Chassis with fnic DRV_VERSION "1.6.0.47"

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1944005

Title:
  System Lockup during failover

Status in linux package in Ubuntu:
  Incomplete
Status in multipath-tools package in Ubuntu:
  Confirmed

Bug description:
  When testing failover for configured multipath devices, I am seeing
  some nodes lock up and become unresponsive.  These nodes have frozen
  consoles, but can be pinged.  So some of the network stack is
  functional, but ssh is not possible to them.

  1) 20.04 focal
  2) 0.8.3-1ubuntu2
  3) multipathing failover would be successful across all nodes and the nodes 
would continue to be responsive
  4) some of the nodes lock up when attempting to failover and cannot be 
reached via ssh or console

  Here is the ubuntu-bug output:
  https://pastebin.canonical.com/p/6MyWBggtS2/

  Here is the kern.log: https://pastebin.canonical.com/p/rG42gzXG9X/

  Here is the syslog: https://pastebin.canonical.com/p/C99ZprphZn/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1944005/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1944586] Re: kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION "1.6.0.47"

2021-09-27 Thread Steven Parker
** Description changed:

  [Impact]
  
  It has been brought to my attention the following:
  
  "
  We have been experiencing node lockups and degradation when testing fiber 
channel fail over for multi-path PURESTORAGE drives.
  
  Testing usually consists of either failing over the fabric or the local
  I/O module for the Cisco chassis which houses a number of individual
  blades.
  
  After rebooting a local Chassis I/O module we see commands like multipath -ll 
hanging.
  Resetting the blades individual fiber channel interface results in the 
following messages.
  "
  
  6051160.241383]  rport-9:0-1: blocked FC remote port time out: removing 
target and saving binding
  [6051160.252901] BUG: kernel NULL pointer dereference, address: 
0040
  [6051160.262267] #PF: supervisor read access in kernel mode
  [6051160.269314] #PF: error_code(0x) - not-present page
  [6051160.276016] PGD 0 P4D 0
  [6051160.279807] Oops:  [#1] SMP NOPTI
  [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P   O 
 5.4.0-77-generic #86-Ubuntu
  [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, 
BIOS B200M5.4.1.1d.0.0609200543 06/09/2020
  [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport 
[scsi_transport_fc]
  [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic]
  [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 
48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 
78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01
  [6051160.346553] RSP: 0018:bc224f297d90 EFLAGS: 00010082
  [6051160.353115] RAX:  RBX: 90abdd4c4b00 RCX: 
90d8ab2c2bb0
  [6051160.361983] RDX: 90d8b5467400 RSI:  RDI: 
90d8ab3b4b40
  [6051160.370812] RBP: bc224f297df8 R08: 90d8c08978c8 R09: 
90d8b8850800
  [6051160.379518] R10: 90d8a59d64c0 R11: 0001 R12: 
90d8ab2c31f8
  [6051160.388242] R13:  R14: 0246 R15: 
90d8ab2c27b8
  [6051160.396953] FS:  () GS:90d8c088() 
knlGS:
  [6051160.406838] CS:  0010 DS:  ES:  CR0: 80050033
  [6051160.414168] CR2: 0040 CR3: 000fc1c0a004 CR4: 
007626e0
  [6051160.423146] DR0:  DR1:  DR2: 

  [6051160.431884] DR3:  DR6: fffe0ff0 DR7: 
0400
  [6051160.440615] PKRU: 5554
  [6051160.444337] Call Trace:
  [6051160.447841]  fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc]
  [6051160.455263]  fc_timeout_deleted_rport.cold+0x1bc/0x2c7 
[scsi_transport_fc]
  [6051160.463623]  process_one_work+0x1eb/0x3b0
  [6051160.468784]  worker_thread+0x4d/0x400
  [6051160.473660]  kthread+0x104/0x140
  [6051160.478102]  ? process_one_work+0x3b0/0x3b0
  [6051160.483439]  ? kthread_park+0x90/0x90
  [6051160.488213]  ret_from_fork+0x1f/0x40
  [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) 
zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter 
ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle 
iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock 
unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 
vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag 
inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter 
bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common 
skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp 
kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca 
ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid 
sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor
  [6051160.492928]  async_tx xor raid6_pq libcrc32c raid1 raid0 multipath 
linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper 
crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect 
ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid 
cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi
  [6051160.632623] CR2: 0040
  [6051160.637043] ---[ end trace 236e6f4850146477 ]---
  
  [Test Plan]
  
-  ???
+ There are two ways to replicate the bug:
+ 
+ Specific hardware:
+Chassis Cisco UCS 5108 AC2 Chassis
+Blades Cisco UCS B200
+IO module Cisco UCS 2408
+ 
+ Server loads - Ubuntu 20.04 cluster running deployed maas, juju and
+ openstack.
+ 
+ 1) Reset a single chassis I/O module or fail over a fabric interconnect
+ (FI) for all chassis in the cluster. We have performed both tests.
+ 
+ Fail over of single chassis I/O module results in at least one node locking 
up.
+ Af

[Kernel-packages] [Bug 1834213] Re: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances

2019-10-21 Thread Steven Parker
Work around

Load:
sudo modprobe nf_conntrack_ipv4 

Confirm:
lsmod | grep nf_conntrack_ipv4

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834213

Title:
  After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic
  to instances

Status in OpenStack neutron-openvswitch charm:
  Fix Committed
Status in neutron:
  New
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  With an environment running Xenial-Queens, and having just upgraded
  the linux-image-generic kernel for MDS patching, a few of our
  hypervisor hosts that were rebooted (3 out of 100) ended up dropping
  IP (tcp/udp) ingress traffic.

  It turns out that nf_conntrack module was loaded, but
  nf_conntrack_ipv4 was not loading, and the traffic was being dropped
  by this rule:

   table=72, n_packets=214989, priority=50,ct_state=+inv+trk
  actions=resubmit(,93)

  The ct_state "inv" means invalid conntrack state, which the manpage
  describes as:

   The state is invalid, meaning that the connection tracker
   couldn’t identify the connection. This flag is a catch-
   all for problems in the connection or the connection
   tracker, such as:

   • L3/L4 protocol handler is not loaded/unavailable.
  With the Linux kernel datapath, this may mean that
  the nf_conntrack_ipv4 or nf_conntrack_ipv6 modules
  are not loaded.

   • L3/L4 protocol handler determines that the packet
  is malformed.

   • Packets are unexpected length for protocol.

  It appears that there may be an issue when patching the OS of a
  hypervisor not running instances may fail to update initrd to load
  nf_conntrack_ipv4 (and/or _ipv6).

  I couldn't find anywhere in the charm code that this would be loaded
  unless the charm's "harden" option is used on nova-compute charm (see
  charmhelpers contrib/host templates).  It is unset in our environment,
  so we are not using any special module probing.

  Did nf_conntrack_ipv4 get split out from nf_conntrack in recent kernel
  upgrades or is it possible that the charm should define a modprobe
  file if we have the OVS firewall driver configured?

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1834213/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1834213] Re: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances

2019-07-23 Thread Steven Parker
Kernel version

uname -r
4.4.0-150-generic


apt list --installed | fgrep image

cloud-image-utils/xenial-updates,now 0.27-0ubuntu25.1 all [installed,automatic]
genisoimage/xenial,now 9:1.1.11-3ubuntu1 amd64 [installed]
linux-image-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 
amd64 [installed,automatic]
linux-image-4.4.0-148-generic/xenial-updates,xenial-security,now 4.4.0-148.174 
amd64 [installed,automatic]
linux-image-4.4.0-150-generic/xenial-updates,xenial-security,now 4.4.0-150.176 
amd64 [installed,automatic]
linux-image-extra-4.4.0-137-generic/xenial-updates,xenial-security,now 
4.4.0-137.163 amd64 [installed,automatic]
linux-image-generic/now 4.4.0.150.158 amd64 [installed,upgradable to: 
4.4.0.154.162]
linux-signed-image-4.4.0-137-generic/xenial-updates,xenial-security,now 
4.4.0-137.163 amd64 [installed,automatic]
ubuntu-cloudimage-keyring/xenial,now 2013.11.11 all [installed]


openvswitch version

apt list --installed | fgrep vswitch

neutron-openvswitch-agent/now 2:12.0.5-0ubuntu1~cloud0 all 
[installed,upgradable to: 2:12.0.6-0ubuntu2~cloud0]
openvswitch-common/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 
[installed]
openvswitch-switch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 
[installed]
python-openvswitch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 all 
[installed]

let me know if you need anything else.

Thanks,

Steven

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834213

Title:
  After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic
  to instances

Status in OpenStack neutron-openvswitch charm:
  Incomplete
Status in linux package in Ubuntu:
  Incomplete

Bug description:
  With an environment running Xenial-Queens, and having just upgraded
  the linux-image-generic kernel for MDS patching, a few of our
  hypervisor hosts that were rebooted (3 out of 100) ended up dropping
  IP (tcp/udp) ingress traffic.

  It turns out that nf_conntrack module was loaded, but
  nf_conntrack_ipv4 was not loading, and the traffic was being dropped
  by this rule:

   table=72, n_packets=214989, priority=50,ct_state=+inv+trk
  actions=resubmit(,93)

  The ct_state "inv" means invalid conntrack state, which the manpage
  describes as:

   The state is invalid, meaning that the connection tracker
   couldn’t identify the connection. This flag is a catch-
   all for problems in the connection or the connection
   tracker, such as:

   • L3/L4 protocol handler is not loaded/unavailable.
  With the Linux kernel datapath, this may mean that
  the nf_conntrack_ipv4 or nf_conntrack_ipv6 modules
  are not loaded.

   • L3/L4 protocol handler determines that the packet
  is malformed.

   • Packets are unexpected length for protocol.

  It appears that there may be an issue when patching the OS of a
  hypervisor not running instances may fail to update initrd to load
  nf_conntrack_ipv4 (and/or _ipv6).

  I couldn't find anywhere in the charm code that this would be loaded
  unless the charm's "harden" option is used on nova-compute charm (see
  charmhelpers contrib/host templates).  It is unset in our environment,
  so we are not using any special module probing.

  Did nf_conntrack_ipv4 get split out from nf_conntrack in recent kernel
  upgrades or is it possible that the charm should define a modprobe
  file if we have the OVS firewall driver configured?

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1834213/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp