Hello all,

We are experiencing VPP crashes that occur a few days after the startup
because of API segment exhaustion. Increasing API segment size to 256MB
didn't stop the crashes from occurring.

Can you please take a look at the description below and tell us if you have
seen similar issues or have any ideas what the cause may be?

Given:

   - VPP 22.10
   - 2 worker threads
   - API segment size is 256MB
   - ~893k IPv4 routes and ~160k IPv6 routes added


Backtrace:

> [..]
> #32660 0x000055b02f606896 in os_panic () at
> /home/jenkins/tnsr-pkgs/work/vpp/src/vpp/vnet/main.c:414
> #32661 0x00007fce3c0ec740 in clib_mem_heap_alloc_inline (heap=0x0,
> size=<optimized out>, align=8,
>     os_out_of_memory_on_failure=1) at
> /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:613
> #32662 clib_mem_alloc (size=<optimized out>)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vppinfra/mem_dlmalloc.c:628
> #32663 0x00007fce3dc4ee6f in vl_msg_api_alloc_internal
> (vlib_rp=0x130026000, nbytes=69, pool=0,
>     may_return_null=0) at
> /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memory_shared.c:179
> #32664 0x00007fce3dc592cd in vl_api_rpc_call_main_thread_inline
> (force_rpc=0 '\000',
>     fp=<optimized out>, data=<optimized out>, data_length=<optimized out>)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:617
> #32665 vl_api_rpc_call_main_thread (fp=0x7fce3c74de70 <ip_neighbor_learn>,
>     data=0x7fcc372bdc00 "& \001$ ", data_length=28)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlibmemory/memclnt_api.c:641
> #32666 0x00007fce3cc7fe2d in icmp6_neighbor_solicitation_or_advertisement
> (vm=0x7fccc0864000,
>     frame=0x7fcccd7d2d40, is_solicitation=1, node=<optimized out>)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:163
> #32667 icmp6_neighbor_solicitation (vm=0x7fccc0864000,
> node=0x7fccc09e3380, frame=0x7fcccd7d2d40)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vnet/ip6-nd/ip6_nd.c:322
> #32668 0x00007fce3c1a2fe0 in dispatch_node (vm=0x7fccc0864000,
> node=0x7fce3dc74836,
>     type=VLIB_NODE_TYPE_INTERNAL, dispatch_state=VLIB_NODE_STATE_POLLING,
> frame=0x7fcccd7d2d40,
>     last_time_stamp=4014159654296481) at
> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:961
> #32669 dispatch_pending_node (vm=0x7fccc0864000, pending_frame_index=7,
>     last_time_stamp=4014159654296481) at
> /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1120
> #32670 vlib_main_or_worker_loop (vm=0x7fccc0864000, is_main=0)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1589
> #32671 vlib_worker_loop (vm=vm@entry=0x7fccc0864000)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/main.c:1723
> #32672 0x00007fce3c1f581a in vlib_worker_thread_fn (arg=0x7fccbdb11b40)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:1579
> #32673 0x00007fce3c1f02c1 in vlib_worker_thread_bootstrap_fn
> (arg=0x7fccbdb11b40)
>     at /home/jenkins/tnsr-pkgs/work/vpp/src/vlib/threads.c:418
> #32674 0x00007fce3be3db43 in start_thread (arg=<optimized out>) at
> ./nptl/pthread_create.c:442
> #32675 0x00007fce3becfa00 in clone3 () at
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>

According to the backtrace, an IPv6 neighbor is being learned. Since the
packet was received on a worker thread, the neighbor information is being
passed to the main thread by making an RPC call (that works via the API).
For this, an API message for RPC call is being allocated from the API
segment (as а client). But the allocation is failing because of no
available memory.

If inspect the API rings after crashing, it can be seen that they are all
filled with VL_API_RPC_CALL messages. Also, it can be seen that there are a
lot of pending RPC requests (vm->pending_rpc_requests has ~3.3M items).
Thus, API segment exhaustion occurs because of a huge number of pending RPC
messages.

RPC messages are processed in a process node called api-rx-from-ring
(function is called vl_api_clnt_process). And process nodes are handled in
the main thread only.

First hypothesis is that the main loop of the main thread pauses for such a
long time that a huge number of pending RPC messages are accumulated by the
worker threads (that keep running). But this doesn't seem to be confirmed
if inspect vm->loop_interval_start of all threads after crashing.
vm->loop_interval_start of the worker threads would have been greater
than vm->loop_interval_start of the main thread.

> (gdb) p vlib_global_main.vlib_mains[0]->loop_interval_start
> $117 = 197662.55595008997
> (gdb) p vlib_global_main.vlib_mains[1]->loop_interval_start
> $119 = 197659.82887979984
> (gdb) p vlib_global_main.vlib_mains[2]->loop_interval_start
> $121 = 197659.93944517447
>

Second hypothesis is that pending RPC messages stop being processed
completely at some point and keep being accumulated while the memory
permits. This seems to be confirmed if inspect the process node after
crashing. It can be seen that vm->main_loop_count is much bigger than the
process node's main_loop_count_last_dispatch (difference is ~50M
iterations). Although, according to the flags, the node is waiting for
timer or an event and it was suspended for 20s. Also,
vm->queue_signal_pending is set and this means that QUEUE_SIGNAL_EVENT was
signaled and the process node was supposed to be executed shortly. But such
a big difference since last execution of the node seems to be indicating
that it wasn't executed by timer either.

> (gdb) p vlib_global_main.vlib_mains[0]->main_loop_count
> $115 = 1696611558
> (gdb) p vlib_global_main.vlib_mains[0]->queue_signal_pending
> $116 = 1
> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[26]
> $122 = {[...], function = 0x7fce3dc57260 <vl_api_clnt_process>,
> [...], main_loop_count_last_dispatch = 1646632623,
> [...], flags = 11,
> [...], resume_clock_interval = 2000000,
> [...]}
>

What is also worth noting is that if look what stop_timer_handle the
process node keeps, it can be seen that the timer doesn't belong to it. In
fact, it belongs to another process node. It might look like the timer was
reused. And if compare the values of main_loop_count_last_dispatch, after
one crash, it does look like a reuse but after another one, it doesn't. For
example, in this case, it doesn't look like a reuse because 1,646,632,623 >
1,552,541,082.

> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[26]
> $254 = {[...], function = 0x7fce3dc57260 <vl_api_clnt_process>,
> [...], main_loop_count_last_dispatch = 1646632623,
> [...], stop_timer_handle = 3087,
> [...]}
> (gdb) p
> ((tw_timer_wheel_1t_3w_1024sl_ov_t*)vlib_global_main.vlib_mains[0]->node_main.timing_wheel)->timers[3087]
> $255 = {next = 3095, prev = 3074, {{fast_ring_offset = 287,
> slow_ring_offset = 356}, expiration_time = 18446744069437915423},
> user_handle = 84}
> (gdb) p *vlib_global_main.vlib_mains[0]->node_main.processes[42]
> $256 = {[...], function = 0x7fce3c638700 <ip4_sv_reass_walk_expired>,
> [...], main_loop_count_last_dispatch = 1552541082,
> [...], stop_timer_handle = 3087,
> [...]}


Regards,
Alexander
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22503): https://lists.fd.io/g/vpp-dev/message/22503
Mute This Topic: https://lists.fd.io/mt/96500275/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to