[Kernel-packages] [Bug 1944005] Re: System Lockup during failover
*** This bug is a duplicate of bug 1944586 *** https://bugs.launchpad.net/bugs/1944586 Dulicate of bug which is now fixed in stable https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1944586/comments/8 ** This bug has been marked a duplicate of bug 1944586 kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION "1.6.0.47" -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1944005 Title: System Lockup during failover Status in linux package in Ubuntu: Incomplete Status in multipath-tools package in Ubuntu: Confirmed Bug description: When testing failover for configured multipath devices, I am seeing some nodes lock up and become unresponsive. These nodes have frozen consoles, but can be pinged. So some of the network stack is functional, but ssh is not possible to them. 1) 20.04 focal 2) 0.8.3-1ubuntu2 3) multipathing failover would be successful across all nodes and the nodes would continue to be responsive 4) some of the nodes lock up when attempting to failover and cannot be reached via ssh or console Here is the ubuntu-bug output: https://pastebin.canonical.com/p/6MyWBggtS2/ Here is the kern.log: https://pastebin.canonical.com/p/rG42gzXG9X/ Here is the syslog: https://pastebin.canonical.com/p/C99ZprphZn/ To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1944005/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1944586] Re: kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION "1.6.0.47"
** Description changed: [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:bc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: RBX: 90abdd4c4b00 RCX: 90d8ab2c2bb0 [6051160.361983] RDX: 90d8b5467400 RSI: RDI: 90d8ab3b4b40 [6051160.370812] RBP: bc224f297df8 R08: 90d8c08978c8 R09: 90d8b8850800 [6051160.379518] R10: 90d8a59d64c0 R11: 0001 R12: 90d8ab2c31f8 [6051160.388242] R13: R14: 0246 R15: 90d8ab2c27b8 [6051160.396953] FS: () GS:90d8c088() knlGS: [6051160.406838] CS: 0010 DS: ES: CR0: 80050033 [6051160.414168] CR2: 0040 CR3: 000fc1c0a004 CR4: 007626e0 [6051160.423146] DR0: DR1: DR2: [6051160.431884] DR3: DR6: fffe0ff0 DR7: 0400 [6051160.440615] PKRU: 5554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] - ??? + There are two ways to replicate the bug: + + Specific hardware: +Chassis Cisco UCS 5108 AC2 Chassis +Blades Cisco UCS B200 +IO module Cisco UCS 2408 + + Server loads - Ubuntu 20.04 cluster running deployed maas, juju and + openstack. + + 1) Reset a single chassis I/O module or fail over a fabric interconnect + (FI) for all chassis in the cluster. We have performed both tests. + + Fail over of single chassis I/O module results in at least one node locking up. + Af
[Kernel-packages] [Bug 1834213] Re: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances
Work around Load: sudo modprobe nf_conntrack_ipv4 Confirm: lsmod | grep nf_conntrack_ipv4 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1834213 Title: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances Status in OpenStack neutron-openvswitch charm: Fix Committed Status in neutron: New Status in linux package in Ubuntu: Confirmed Bug description: With an environment running Xenial-Queens, and having just upgraded the linux-image-generic kernel for MDS patching, a few of our hypervisor hosts that were rebooted (3 out of 100) ended up dropping IP (tcp/udp) ingress traffic. It turns out that nf_conntrack module was loaded, but nf_conntrack_ipv4 was not loading, and the traffic was being dropped by this rule: table=72, n_packets=214989, priority=50,ct_state=+inv+trk actions=resubmit(,93) The ct_state "inv" means invalid conntrack state, which the manpage describes as: The state is invalid, meaning that the connection tracker couldn’t identify the connection. This flag is a catch- all for problems in the connection or the connection tracker, such as: • L3/L4 protocol handler is not loaded/unavailable. With the Linux kernel datapath, this may mean that the nf_conntrack_ipv4 or nf_conntrack_ipv6 modules are not loaded. • L3/L4 protocol handler determines that the packet is malformed. • Packets are unexpected length for protocol. It appears that there may be an issue when patching the OS of a hypervisor not running instances may fail to update initrd to load nf_conntrack_ipv4 (and/or _ipv6). I couldn't find anywhere in the charm code that this would be loaded unless the charm's "harden" option is used on nova-compute charm (see charmhelpers contrib/host templates). It is unset in our environment, so we are not using any special module probing. Did nf_conntrack_ipv4 get split out from nf_conntrack in recent kernel upgrades or is it possible that the charm should define a modprobe file if we have the OVS firewall driver configured? To manage notifications about this bug go to: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1834213/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1834213] Re: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances
Kernel version uname -r 4.4.0-150-generic apt list --installed | fgrep image cloud-image-utils/xenial-updates,now 0.27-0ubuntu25.1 all [installed,automatic] genisoimage/xenial,now 9:1.1.11-3ubuntu1 amd64 [installed] linux-image-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic] linux-image-4.4.0-148-generic/xenial-updates,xenial-security,now 4.4.0-148.174 amd64 [installed,automatic] linux-image-4.4.0-150-generic/xenial-updates,xenial-security,now 4.4.0-150.176 amd64 [installed,automatic] linux-image-extra-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic] linux-image-generic/now 4.4.0.150.158 amd64 [installed,upgradable to: 4.4.0.154.162] linux-signed-image-4.4.0-137-generic/xenial-updates,xenial-security,now 4.4.0-137.163 amd64 [installed,automatic] ubuntu-cloudimage-keyring/xenial,now 2013.11.11 all [installed] openvswitch version apt list --installed | fgrep vswitch neutron-openvswitch-agent/now 2:12.0.5-0ubuntu1~cloud0 all [installed,upgradable to: 2:12.0.6-0ubuntu2~cloud0] openvswitch-common/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 [installed] openvswitch-switch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 amd64 [installed] python-openvswitch/xenial-updates,now 2.9.2-0ubuntu0.18.04.3~cloud0 all [installed] let me know if you need anything else. Thanks, Steven -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1834213 Title: After kernel upgrade, nf_conntrack_ipv4 module unloaded, no IP traffic to instances Status in OpenStack neutron-openvswitch charm: Incomplete Status in linux package in Ubuntu: Incomplete Bug description: With an environment running Xenial-Queens, and having just upgraded the linux-image-generic kernel for MDS patching, a few of our hypervisor hosts that were rebooted (3 out of 100) ended up dropping IP (tcp/udp) ingress traffic. It turns out that nf_conntrack module was loaded, but nf_conntrack_ipv4 was not loading, and the traffic was being dropped by this rule: table=72, n_packets=214989, priority=50,ct_state=+inv+trk actions=resubmit(,93) The ct_state "inv" means invalid conntrack state, which the manpage describes as: The state is invalid, meaning that the connection tracker couldn’t identify the connection. This flag is a catch- all for problems in the connection or the connection tracker, such as: • L3/L4 protocol handler is not loaded/unavailable. With the Linux kernel datapath, this may mean that the nf_conntrack_ipv4 or nf_conntrack_ipv6 modules are not loaded. • L3/L4 protocol handler determines that the packet is malformed. • Packets are unexpected length for protocol. It appears that there may be an issue when patching the OS of a hypervisor not running instances may fail to update initrd to load nf_conntrack_ipv4 (and/or _ipv6). I couldn't find anywhere in the charm code that this would be loaded unless the charm's "harden" option is used on nova-compute charm (see charmhelpers contrib/host templates). It is unset in our environment, so we are not using any special module probing. Did nf_conntrack_ipv4 get split out from nf_conntrack in recent kernel upgrades or is it possible that the charm should define a modprobe file if we have the OVS firewall driver configured? To manage notifications about this bug go to: https://bugs.launchpad.net/charm-neutron-openvswitch/+bug/1834213/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp