On Mon, May 21, 2018 at 11:12 AM, Johannes Erdfelt <[email protected]> wrote:
> I've been troubleshooting a kernel panic we've seen in our production
> environment. First the kernel panic.
>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:3254!
> invalid opcode: 0000 [#1] SMP
> Modules linked in: zram vhost_vsock vmw_vsock_virtio_transport_common vsock 
> nfnetlink_queue nfnetlink_log bluetooth iptable_nat xfs nf_conntrack_netlink 
> nfnetlink ufs act_police cls_basic sch_ingress ebtable_filter ebtables 
> ip6table_filter iptable_filter nbd ip6table_raw ip6_tables xt_CT iptable_raw 
> ip_tables x_tables vport_stt(OE) openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 
> nf_nat udp_tunnel dm_crypt ipmi_ssif bonding ipmi_devintf nf_conntrack_ftp 
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 dcdbas intel_rapl 
> nf_defrag_ipv4 sb_edac edac_core nf_conntrack x86_pkg_temp_thermal 
> intel_powerclamp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
> dm_multipath aesni_intel aes_x86_64 lrw glue_helper ablk_helper kvm_intel 
> cryptd intel_cstate intel_rapl_perf kvm irqbypass mei_me ipmi_si vhost_net 
> mei lpc_ich ipmi_msghandler shpchp vhost acpi_power_meter macvtap mac_hid 
> macvlan coretemp lp parport btrfs raid456 async_raid6_recov async_memcpy asyn
> crc32c raid0 multipath linear raid1 raid10 ses enclosure scsi_transport_sas 
> sfc(OE) mtd ptp ahci pps_core libahci mdio wmi megaraid_sas(OE) fjes [last 
> unloaded: zram]
> CPU: 10 PID: 39947 Comm: CPU 0/KVM Tainted: G           OE K 4.9.77-1-generic 
> #4
> Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 1.3.6 06/03/2015
> task: ffff9b01ed1eab80 task.stack: ffffa7c0a2b04000
> RIP: 0010:[<ffffffffc0734e17>]  [<ffffffffc0734e17>] skb_segment+0xce7/0xed0
> RSP: 0018:ffff9b237f943618  EFLAGS: 00010246
> RAX: 00000000000089d5 RBX: ffff9b107c430f00 RCX: ffff9b107c431800
> RDX: ffff9b22a5ab0d00 RSI: 00000000000060e2 RDI: 0000000000000440
> RBP: ffff9b237f9436e8 R08: 00000000000060e2 R09: 000000000000626a
> R10: 0000000000005ca2 R11: 0000000000000000 R12: ffff9b11279396c0
> R13: ffff9b5360ff5500 R14: 00000000000060e2 R15: 0000000000000011
> FS:  00007f557e58f700(0000) GS:ffff9b237f940000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1d5f7d30f2 CR3: 00000006e4f9a000 CR4: 0000000000162670
> Stack:
>  ffff9b107c431800 ffffffffffffffde fffffff400000000 ffff9b107c431800
>  ffff9b00c625bdf0 00005b2e01b39740 0000000000000001 0000000000000088
>  0000000000b39740 0000000000000022 0000000000000009 ffff9b5300000000
> Call Trace:
>  <IRQ> [<ffffffff94bc6137>] udp4_ufo_fragment+0x127/0x1a0
>  [<ffffffff94bcf32d>] inet_gso_segment+0x16d/0x3c0
>  [<ffffffff94b5293a>] skb_mac_gso_segment+0xaa/0x110
>  [<ffffffff94b52a66>] __skb_gso_segment+0xc6/0x190
>  [<ffffffff946760d0>] ? ep_read_events_proc+0xc0/0xc0
>  [<ffffffffc0665b3f>] queue_gso_packets+0x7f/0x1b0 [openvswitch]
>  [<ffffffffc069d88d>] ? udp_error+0x16d/0x1c0 [nf_conntrack]
>  [<ffffffffc0695282>] ? nf_ct_get_tuple+0x82/0xa0 [nf_conntrack]
>  [<ffffffffc069d910>] ? udp_packet+0x30/0x90 [nf_conntrack]
>  [<ffffffffc066dabc>] ? flow_lookup.isra.6+0x7c/0xb0 [openvswitch]
>  [<ffffffffc0697d95>] ? nf_conntrack_in+0x2d5/0x560 [nf_conntrack]
>  [<ffffffffc0665dc1>] ovs_dp_upcall+0x31/0x60 [openvswitch]
>  [<ffffffffc0665ef3>] ovs_dp_process_packet+0x103/0x120 [openvswitch]
>  [<ffffffffc065f2d4>] do_execute_actions+0x834/0x1510 [openvswitch]
>  [<ffffffffc066dabc>] ? flow_lookup.isra.6+0x7c/0xb0 [openvswitch]
>  [<ffffffffc065fff3>] ovs_execute_actions+0x43/0x110 [openvswitch]
>  [<ffffffffc0665e76>] ovs_dp_process_packet+0x86/0x120 [openvswitch]
>  [<ffffffffc0670040>] ? netdev_port_receive+0x100/0x100 [openvswitch]
>  [<ffffffffc066f576>] ovs_vport_receive+0x76/0xd0 [openvswitch]
>  [<ffffffff94b4fc3c>] ? netif_rx+0x1c/0x70
>  [<ffffffffc06703ec>] ? ovs_ip_tunnel_rcv+0x8c/0xe0 [openvswitch]
>  [<ffffffff94b8ae2b>] ? nf_iterate+0x5b/0x70
>  [<ffffffffc0672888>] ? nf_ip_hook+0x738/0xde0 [openvswitch]
>  [<ffffffff94b91df9>] ? ip_rcv_finish+0x129/0x420
>  [<ffffffff94b8ae9b>] ? nf_hook_slow+0x5b/0xa0
>  [<ffffffffc066fff0>] netdev_port_receive+0xb0/0x100 [openvswitch]
>  [<ffffffffc0670040>] ? netdev_port_receive+0x100/0x100 [openvswitch]
>  [<ffffffffc0670078>] netdev_frame_hook+0x38/0x60 [openvswitch]
>  [<ffffffff94b501b0>] __netif_receive_skb_core+0x220/0xac0
>  [<ffffffffc028c1e0>] ? efx_fast_push_rx_descriptors+0x50/0x310 [sfc]
>  [<ffffffff94b50a68>] __netif_receive_skb+0x18/0x60
>  [<ffffffff94b51b99>] process_backlog+0x89/0x140
>  [<ffffffff94b511ac>] net_rx_action+0x10c/0x360
>  [<ffffffff94c6eb0f>] __do_softirq+0xdf/0x2bb
>  [<ffffffffc0285642>] ? efx_ef10_msi_interrupt+0x62/0x70 [sfc]
>  [<ffffffff94c6dc3b>] do_IRQ+0x8b/0xd0
>  [<ffffffff94487816>] irq_exit+0xb6/0xc0
>  [<ffffffff94c6b956>] common_interrupt+0x96/0x96
>  <EOI> [<ffffffff94c6b798>] ?  irq_entries_start+0x578/0x6a0
>  [<ffffffffc07b367b>] ? vmx_handle_external_intr+0x5b/0x60 [kvm_intel]
>  [<ffffffffc052fe86>] vcpu_enter_guest+0x396/0x1290 [kvm]
>  [<ffffffffc0536e07>] kvm_arch_vcpu_ioctl_run+0xb7/0x3d0 [kvm]
>  [<ffffffffc051c6cf>] kvm_vcpu_ioctl+0x2af/0x570 [kvm]
>  [<ffffffff94508362>] ? do_futex+0xb2/0x520
>  [<ffffffff94641bb9>] do_vfs_ioctl+0x99/0x5f0
>  [<ffffffffc052c6bf>] ? kvm_on_user_return+0x6f/0xa0 [kvm]
>  [<ffffffff94642189>] SyS_ioctl+0x79/0x90
>  [<ffffffff94c6aee4>] entry_SYSCALL_64_fastpath+0x24/0xcf
> Code: 89 87 e0 00 00 00 49 8b 57 60 48 8b 43 60 48 89 53 60 49 89 47 60 49 8b 
> 57 18 48 8b 43 18 48 89 53 18 49 89 47 18 e9 fa fb ff ff <0f> 0b 44 89 ee 48 
> 89 df e8 6c 9a 40 d4 85 c0 0f 84 78 fe ff ff
> RIP  [<ffffffffc0734e17>] skb_segment+0xce7/0xed0
>  RSP <ffff9b237f943618>
> ---[ end trace f0d2cc8df9be8c23 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x13400000 from 0xffffffff81000000 (relocation range: 
> 0xffffffff80000000-0xffffffffbfffffff)
> Rebooting in 10 seconds..
> ACPI MEMORY or I/O RESET_REG.
>
> We are running a 4.9.77 kernel with one patch backported.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/core/skbuff.c?h=v4.17-rc6&id=13acc94eff122b260a387d68668bf9d670738e6a
>
> This patch is to a fix a different kernel panic using STT, however the
> panic is still reproducible without this patch applied.
>
> We are using stock Open vSwitch 2.7.3.
>
> The panic is very reproducible, but it does require some configuration.
>
> In our case, we have two hosts acting as hypervisors. Each host has one
> guest VM. An STT tunnel is setup between the two hosts attached to each
> guest. One guest will act as a source and one will act as a destination.
> The destination has connection tracking setup in the flows. We have a
> script running `ovs-dpctl del-flows` in a loop to make reproducing the
> crash easier, but it's not strictly necessary. (This is just to make
> it easier for an upcall to occur, see below)
>
> The source guest then sends a couple of very large (>60k) UDP packets.
> The destination host then crashes with the above panic.
>
> The crash is a result of an skb that is not understood by skb_segment
> in net/core/skbuff.c.
>
> The skb comes from the solarflare NIC as a large skb, requiring the use
> of frag_list. It looks something like this (note the use of frag_list):
>
> skb: ffff92544a0bf000
>   len: 60177, data_len: 60169, nr_frags: 17
>   frag_list: ffff92544a0bed00
>
> skb: ffff92544a0bed00
>   len: 24820, data_len: 24820, nr_frags: 17
>   next: ffff92544a0be700
>
> skb: ffff92544a0be700
>   len: 10589, data_len: 10589, nr_frags: 8
>
> It winds its way through the networking core and openvswitch (stripping
> off the outer STT encapsulation) eventually requiring an upcall.
>
Thanks for the details.

I am not sure why are you seeing frag_list for large UDP packet, STT
would linearize such UDP packet.
https://github.com/openvswitch/ovs/blob/06db81ccfe6d4c779de2ca73033abd7020db419b/datapath/linux/compat/stt.c#L299
Can you check why are you seeing frag_list after STT decapsulation?


Is SKIP_ZERO_COPY defined for the STT build?
May be the assumption in try_to_segment() is not true for skb
generated by certain NICs.
ref: 
https://github.com/openvswitch/ovs/blob/06db81ccfe6d4c779de2ca73033abd7020db419b/datapath/linux/compat/stt.c#L566
In that case we need to fix STT.
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to