This seems consistent with a SIGSEGV compounded by a worker-thread stack 
overflow situation. In hopes of obtaining a clean core file, you might want to 
modify the SIGSEGV handler to simply abort() instead of trying to write a 
post-mortem API dump, syslog’ing a backtrace, etc.

 

Best of luck with it.  

 

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Stanislav Zaikin
Sent: Wednesday, February 1, 2023 6:17 AM
To: vpp-dev <vpp-dev@lists.fd.io>
Subject: [vpp-dev] sigsegv and its handler

 

Hello folks,

 

I've been experiencing rare crashes (one crash in 3 months or so), it looks 
like the heap is corrupted somehow. Sometimes, the trace shows very unexpected 
nodes (like ip6-map-t although I don't configure any ipv6 map) or sometimes 
it's just a crash inside ip4-rewrite-node.

 

After a look I found that last 2 crashes occured in the same way:

1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call

2. vnet_get_config_data call

 

But then VPP received and handled a SIGSEGV signal. It completely broke the 
stack trace in the core dump (for the corresponding worker):

#0  0x00007f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8, 
maxevents=<optimized out>, timeout=9, set=0x7f44fa5c66f8 
<linux_epoll_input_inline.unblock_all_signals>) at 
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x000000089f6fab2b in ?? ()
#2  0x00007f44babe52d8 in ?? ()
#3  0x0000000900000100 in ?? ()
#4  0x00007f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init () 
from /lib/x86_64-linux-gnu/libvlib.so.22.10.0
#5  0x0000000000000000 in ?? ()

 

So, I can't analyze the core dump. Any ideas on how to catch this crash 
correctly? Disable receiving SIGSEGV? Or is there a way to restore the original 
stack trace of the worker?

 

For the reference, stack traces from syslog:

vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address 
0xb0040114
vnet[2856086]: #0  0x00007f44fa43885b 0x7f44fa43885b (unix_signal_handler+379)
vnet[2856086]: #1  0x00007f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile)
vnet[2856086]: #2  0x00007f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675)
vnet[2856086]: #3  0x00007f44fa3c86fb vlib_worker_loop + 0x1b3b
vnet[2856086]: #4  0x00007f44fa41aafa vlib_worker_thread_fn + 0xaa
vnet[2856086]: #5  0x00007f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[2856086]: #6  0x00007f44fa343609 start_thread + 0xd9
vnet[2856086]: #7  0x00007f44fa081163 clone + 0x43

vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address 
0x7fb3519530fc
vnet[944491]: #0  0x00007faf9102785b 0x7faf9102785b
vnet[944491]: #1  0x00007faf90f3e3c0 0x7faf90f3e3c0
vnet[944491]: #2  0x00007faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e
vnet[944491]: #3  0x00007faf90fb76fb vlib_worker_loop + 0x1b3b
vnet[944491]: #4  0x00007faf91009afa vlib_worker_thread_fn + 0xaa
vnet[944491]: #5  0x00007faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[944491]: #6  0x00007faf90f32609 start_thread + 0xd9
vnet[944491]: #7  0x00007faf90c70163 clone + 0x43

 

Line information:

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44b76dbee3 <ip6_map_t+675> and ends at 0x7f44b76dbee7 <ip6_map_t+679>.

 

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44fb6db6ae <ip4_rewrite_node_fn_skx+5278> and ends at 0x7f44fb6db6b1 
<ip4_rewrite_node_fn_skx+5281>.

 

-- 

Best regards
Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22533): https://lists.fd.io/g/vpp-dev/message/22533
Mute This Topic: https://lists.fd.io/mt/96673497/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to