Daryl Wang via discuss <[email protected]> writes: > We noticed that an OVS datapath stopped responding to stats request. On > checking dmesg, we found that OVS > ran into a null pointer in ovs_flow_alloc while in ovs_packet_cmd_execute. Is > this a known bug?
Not as far as I am aware. > We have not seen the failure again, so we don't have a good sense of what > triggers the error. Logs didn't record > any noteworthy changes to the datapath around the time of failure. > Open vSwitch version is 2.11.2. Unfortunately we did not have kernel > debugging enabled at the time of the > crash: > > May 15 09:23:14 hostname kernel: BUG: kernel NULL pointer dereference, > address: 0000000000000000 > May 15 09:23:14 hostname kernel: #PF: supervisor read access in kernel mode > May 15 09:23:14 hostname kernel: #PF: error_code(0x0000) - not-present page > May 15 09:23:14 hostname kernel: PGD 0 P4D 0 > May 15 09:23:14 hostname kernel: Oops: 0000 [#1] SMP PTI > May 15 09:23:14 hostname kernel: CPU: 8 PID: 158558 Comm: handler91 Tainted: > G O > 5.2.17-1rodete3-amd64 #1 Debian 5.2.17-1rodete3 > May 15 09:23:14 hostname kernel: Hardware name: "Hardware information" > May 15 09:23:14 hostname kernel: RIP: 0010:kmem_cache_alloc_node+0x7e/0x1f0 Note that this is the kmem_cache infra that gets either the flow or flow_stats object during allocation. How easily can you reproduce this? It looks like something broke that cache object - was something unloading / reloading the ovs module during this time? Just wondering how to reproduce it. > May 15 09:23:14 hostname kernel: Code: 75 01 00 00 4d 8b 07 65 49 8b 50 08 65 > 4c 03 05 70 0e fc 74 4d > 8b 30 4d 85 f6 74 1a 41 83 fc ff 0f 84 83 00 00 00 49 8b 40 10 <48> > 8b 00 48 c1 e8 3a 41 39 c4 74 73 48 8b 0c 24 44 89 e2 89 ee 4c > May 15 09:23:14 hostname kernel: RSP: 0018:ffffbc910e0afa68 EFLAGS: 00010213 > May 15 09:23:14 hostname kernel: RAX: 0000000000000000 RBX: 0000000000000000 > RCX: > 0000000000000000 > May 15 09:23:14 hostname kernel: RDX: 0000000000043ce7 RSI: 0000000000000dc0 > RDI: ffff9a7f80126280 > May 15 09:23:14 hostname kernel: RBP: 0000000000000dc0 R08: ffffdc90fc317330 > R09: ffff9a7b06225b60 > May 15 09:23:14 hostname kernel: R10: 0000000000000003 R11: ffffffffc0c5e510 > R12: 0000000000000000 > May 15 09:23:14 hostname kernel: R13: ffff9a7f80126280 R14: ffff9a7f64dbada0 > R15: ffff9a7f80126280 > May 15 09:23:14 hostname kernel: FS: 00007fdb877fe700(0000) > GS:ffff9a803f100000(0000) > knlGS:0000000000000000 > May 15 09:23:14 hostname kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > May 15 09:23:14 hostname kernel: CR2: 0000000000000000 CR3: 0000000ed7d46004 > CR4: > 00000000003606e0 > May 15 09:23:14 hostname kernel: DR0: 0000000000000000 DR1: 0000000000000000 > DR2: > 0000000000000000 > May 15 09:23:14 hostname kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 > DR7: > 0000000000000400 > May 15 09:23:14 hostname kernel: Call Trace: > May 15 09:23:14 hostname kernel: ? ovs_flow_alloc+0x4d/0x90 [openvswitch] > May 15 09:23:14 hostname kernel: ovs_flow_alloc+0x4d/0x90 [openvswitch] > May 15 09:23:14 hostname kernel: ovs_packet_cmd_execute+0xd0/0x2a0 > [openvswitch] > May 15 09:23:14 hostname kernel: ? _cond_resched+0x15/0x30 > May 15 09:23:14 hostname kernel: genl_family_rcv_msg+0x1d2/0x410 > May 15 09:23:14 hostname kernel: ? recalibrate_cpu_khz+0x10/0x10 > May 15 09:23:14 hostname kernel: ? ktime_get_raw_ts64+0x32/0xc0 > May 15 09:23:14 hostname kernel: genl_rcv_msg+0x47/0x90 > May 15 09:23:14 hostname kernel: ? __kmalloc_node_track_caller+0x1cb/0x290 > May 15 09:23:14 hostname kernel: ? genl_family_rcv_msg+0x410/0x410 > May 15 09:23:14 hostname kernel: netlink_rcv_skb+0x49/0x110 > May 15 09:23:14 hostname kernel: genl_rcv+0x24/0x40 > May 15 09:23:15 hostname kernel: netlink_unicast+0x17e/0x200 > May 15 09:23:15 hostname kernel: netlink_sendmsg+0x204/0x3d0 > May 15 09:23:15 hostname kernel: sock_sendmsg+0x4c/0x50 > May 15 09:23:15 hostname kernel: ___sys_sendmsg+0x29f/0x300 > May 15 09:23:15 hostname kernel: ? ep_send_events_proc+0xf7/0x250 > May 15 09:23:15 hostname kernel: ? ep_read_events_proc+0xe0/0xe0 > May 15 09:23:15 hostname kernel: ? ep_scan_ready_list.constprop.21+0x1fe/0x230 > May 15 09:23:15 hostname kernel: __sys_sendmsg+0x57/0xa0 > May 15 09:23:15 hostname kernel: do_syscall_64+0x53/0x130 > May 15 09:23:15 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 > May 15 09:23:15 hostname kernel: RIP: 0033:0x7fdd0141f22d > May 15 09:23:15 hostname kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 > 08 e8 0a ed ff ff 8b 54 24 > 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff > ff 77 2f 44 89 c7 48 89 44 24 > 08 e8 3e ed ff ff 48 > May 15 09:23:15 hostname kernel: RSP: 002b:00007fdb8779f100 EFLAGS: 00000293 > ORIG_RAX: > 000000000000002e > May 15 09:23:15 hostname kernel: RAX: ffffffffffffffda RBX: 00007fdb8779ff80 > RCX: 00007fdd0141f22d > May 15 09:23:15 hostname kernel: RDX: 0000000000000000 RSI: 00007fdb8779f190 > RDI: > 0000000000000015 > > > > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
