On Mon, May 21, 2018 at 11:12 AM, Johannes Erdfelt <[email protected]> wrote: > I've been troubleshooting a kernel panic we've seen in our production > environment. First the kernel panic. > > ------------[ cut here ]------------ > kernel BUG at net/core/skbuff.c:3254! > invalid opcode: 0000 [#1] SMP > Modules linked in: zram vhost_vsock vmw_vsock_virtio_transport_common vsock > nfnetlink_queue nfnetlink_log bluetooth iptable_nat xfs nf_conntrack_netlink > nfnetlink ufs act_police cls_basic sch_ingress ebtable_filter ebtables > ip6table_filter iptable_filter nbd ip6table_raw ip6_tables xt_CT iptable_raw > ip_tables x_tables vport_stt(OE) openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 > nf_nat udp_tunnel dm_crypt ipmi_ssif bonding ipmi_devintf nf_conntrack_ftp > nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 dcdbas intel_rapl > nf_defrag_ipv4 sb_edac edac_core nf_conntrack x86_pkg_temp_thermal > intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel > dm_multipath aesni_intel aes_x86_64 lrw glue_helper ablk_helper kvm_intel > cryptd intel_cstate intel_rapl_perf kvm irqbypass mei_me ipmi_si vhost_net > mei lpc_ich ipmi_msghandler shpchp vhost acpi_power_meter macvtap mac_hid > macvlan coretemp lp parport btrfs raid456 async_raid6_recov async_memcpy asyn > crc32c raid0 multipath linear raid1 raid10 ses enclosure scsi_transport_sas > sfc(OE) mtd ptp ahci pps_core libahci mdio wmi megaraid_sas(OE) fjes [last > unloaded: zram] > CPU: 10 PID: 39947 Comm: CPU 0/KVM Tainted: G OE K 4.9.77-1-generic > #4 > Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.3.6 06/03/2015 > task: ffff9b01ed1eab80 task.stack: ffffa7c0a2b04000 > RIP: 0010:[<ffffffffc0734e17>] [<ffffffffc0734e17>] skb_segment+0xce7/0xed0 > RSP: 0018:ffff9b237f943618 EFLAGS: 00010246 > RAX: 00000000000089d5 RBX: ffff9b107c430f00 RCX: ffff9b107c431800 > RDX: ffff9b22a5ab0d00 RSI: 00000000000060e2 RDI: 0000000000000440 > RBP: ffff9b237f9436e8 R08: 00000000000060e2 R09: 000000000000626a > R10: 0000000000005ca2 R11: 0000000000000000 R12: ffff9b11279396c0 > R13: ffff9b5360ff5500 R14: 00000000000060e2 R15: 0000000000000011 > FS: 00007f557e58f700(0000) GS:ffff9b237f940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f1d5f7d30f2 CR3: 00000006e4f9a000 CR4: 0000000000162670 > Stack: > ffff9b107c431800 ffffffffffffffde fffffff400000000 ffff9b107c431800 > ffff9b00c625bdf0 00005b2e01b39740 0000000000000001 0000000000000088 > 0000000000b39740 0000000000000022 0000000000000009 ffff9b5300000000 > Call Trace: > <IRQ> [<ffffffff94bc6137>] udp4_ufo_fragment+0x127/0x1a0 > [<ffffffff94bcf32d>] inet_gso_segment+0x16d/0x3c0 > [<ffffffff94b5293a>] skb_mac_gso_segment+0xaa/0x110 > [<ffffffff94b52a66>] __skb_gso_segment+0xc6/0x190 > [<ffffffff946760d0>] ? ep_read_events_proc+0xc0/0xc0 > [<ffffffffc0665b3f>] queue_gso_packets+0x7f/0x1b0 [openvswitch] > [<ffffffffc069d88d>] ? udp_error+0x16d/0x1c0 [nf_conntrack] > [<ffffffffc0695282>] ? nf_ct_get_tuple+0x82/0xa0 [nf_conntrack] > [<ffffffffc069d910>] ? udp_packet+0x30/0x90 [nf_conntrack] > [<ffffffffc066dabc>] ? flow_lookup.isra.6+0x7c/0xb0 [openvswitch] > [<ffffffffc0697d95>] ? nf_conntrack_in+0x2d5/0x560 [nf_conntrack] > [<ffffffffc0665dc1>] ovs_dp_upcall+0x31/0x60 [openvswitch] > [<ffffffffc0665ef3>] ovs_dp_process_packet+0x103/0x120 [openvswitch] > [<ffffffffc065f2d4>] do_execute_actions+0x834/0x1510 [openvswitch] > [<ffffffffc066dabc>] ? flow_lookup.isra.6+0x7c/0xb0 [openvswitch] > [<ffffffffc065fff3>] ovs_execute_actions+0x43/0x110 [openvswitch] > [<ffffffffc0665e76>] ovs_dp_process_packet+0x86/0x120 [openvswitch] > [<ffffffffc0670040>] ? netdev_port_receive+0x100/0x100 [openvswitch] > [<ffffffffc066f576>] ovs_vport_receive+0x76/0xd0 [openvswitch] > [<ffffffff94b4fc3c>] ? netif_rx+0x1c/0x70 > [<ffffffffc06703ec>] ? ovs_ip_tunnel_rcv+0x8c/0xe0 [openvswitch] > [<ffffffff94b8ae2b>] ? nf_iterate+0x5b/0x70 > [<ffffffffc0672888>] ? nf_ip_hook+0x738/0xde0 [openvswitch] > [<ffffffff94b91df9>] ? ip_rcv_finish+0x129/0x420 > [<ffffffff94b8ae9b>] ? nf_hook_slow+0x5b/0xa0 > [<ffffffffc066fff0>] netdev_port_receive+0xb0/0x100 [openvswitch] > [<ffffffffc0670040>] ? netdev_port_receive+0x100/0x100 [openvswitch] > [<ffffffffc0670078>] netdev_frame_hook+0x38/0x60 [openvswitch] > [<ffffffff94b501b0>] __netif_receive_skb_core+0x220/0xac0 > [<ffffffffc028c1e0>] ? efx_fast_push_rx_descriptors+0x50/0x310 [sfc] > [<ffffffff94b50a68>] __netif_receive_skb+0x18/0x60 > [<ffffffff94b51b99>] process_backlog+0x89/0x140 > [<ffffffff94b511ac>] net_rx_action+0x10c/0x360 > [<ffffffff94c6eb0f>] __do_softirq+0xdf/0x2bb > [<ffffffffc0285642>] ? efx_ef10_msi_interrupt+0x62/0x70 [sfc] > [<ffffffff94c6dc3b>] do_IRQ+0x8b/0xd0 > [<ffffffff94487816>] irq_exit+0xb6/0xc0 > [<ffffffff94c6b956>] common_interrupt+0x96/0x96 > <EOI> [<ffffffff94c6b798>] ? irq_entries_start+0x578/0x6a0 > [<ffffffffc07b367b>] ? vmx_handle_external_intr+0x5b/0x60 [kvm_intel] > [<ffffffffc052fe86>] vcpu_enter_guest+0x396/0x1290 [kvm] > [<ffffffffc0536e07>] kvm_arch_vcpu_ioctl_run+0xb7/0x3d0 [kvm] > [<ffffffffc051c6cf>] kvm_vcpu_ioctl+0x2af/0x570 [kvm] > [<ffffffff94508362>] ? do_futex+0xb2/0x520 > [<ffffffff94641bb9>] do_vfs_ioctl+0x99/0x5f0 > [<ffffffffc052c6bf>] ? kvm_on_user_return+0x6f/0xa0 [kvm] > [<ffffffff94642189>] SyS_ioctl+0x79/0x90 > [<ffffffff94c6aee4>] entry_SYSCALL_64_fastpath+0x24/0xcf > Code: 89 87 e0 00 00 00 49 8b 57 60 48 8b 43 60 48 89 53 60 49 89 47 60 49 8b > 57 18 48 8b 43 18 48 89 53 18 49 89 47 18 e9 fa fb ff ff <0f> 0b 44 89 ee 48 > 89 df e8 6c 9a 40 d4 85 c0 0f 84 78 fe ff ff > RIP [<ffffffffc0734e17>] skb_segment+0xce7/0xed0 > RSP <ffff9b237f943618> > ---[ end trace f0d2cc8df9be8c23 ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: 0x13400000 from 0xffffffff81000000 (relocation range: > 0xffffffff80000000-0xffffffffbfffffff) > Rebooting in 10 seconds.. > ACPI MEMORY or I/O RESET_REG. > > We are running a 4.9.77 kernel with one patch backported. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/core/skbuff.c?h=v4.17-rc6&id=13acc94eff122b260a387d68668bf9d670738e6a > > This patch is to a fix a different kernel panic using STT, however the > panic is still reproducible without this patch applied. > > We are using stock Open vSwitch 2.7.3. > > The panic is very reproducible, but it does require some configuration. > > In our case, we have two hosts acting as hypervisors. Each host has one > guest VM. An STT tunnel is setup between the two hosts attached to each > guest. One guest will act as a source and one will act as a destination. > The destination has connection tracking setup in the flows. We have a > script running `ovs-dpctl del-flows` in a loop to make reproducing the > crash easier, but it's not strictly necessary. (This is just to make > it easier for an upcall to occur, see below) > > The source guest then sends a couple of very large (>60k) UDP packets. > The destination host then crashes with the above panic. > > The crash is a result of an skb that is not understood by skb_segment > in net/core/skbuff.c. > > The skb comes from the solarflare NIC as a large skb, requiring the use > of frag_list. It looks something like this (note the use of frag_list): > > skb: ffff92544a0bf000 > len: 60177, data_len: 60169, nr_frags: 17 > frag_list: ffff92544a0bed00 > > skb: ffff92544a0bed00 > len: 24820, data_len: 24820, nr_frags: 17 > next: ffff92544a0be700 > > skb: ffff92544a0be700 > len: 10589, data_len: 10589, nr_frags: 8 > > It winds its way through the networking core and openvswitch (stripping > off the outer STT encapsulation) eventually requiring an upcall. > Thanks for the details.
I am not sure why are you seeing frag_list for large UDP packet, STT would linearize such UDP packet. https://github.com/openvswitch/ovs/blob/06db81ccfe6d4c779de2ca73033abd7020db419b/datapath/linux/compat/stt.c#L299 Can you check why are you seeing frag_list after STT decapsulation? Is SKIP_ZERO_COPY defined for the STT build? May be the assumption in try_to_segment() is not true for skb generated by certain NICs. ref: https://github.com/openvswitch/ovs/blob/06db81ccfe6d4c779de2ca73033abd7020db419b/datapath/linux/compat/stt.c#L566 In that case we need to fix STT. _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
