Hoi,

I've had a few instances of a recent VPP hanging - API and CLI go
unresponsive, forwarding stops (at least, I think), but the worker threads
are still consuming CPU.
Attaching GDB, I see the main thread is doing the following:

(gdb) bt

#0  0x00007f5f6f8f271b in sched_yield () at
../sysdeps/unix/syscall-template.S:78

#1  0x00007f5f6fb3df8b in spin_acquire_lock (sl=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:468

#2  mspace_malloc (msp=0x130048040, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4351

#3  0x00007f5f6fb66f81 in mspace_memalign (msp=0x130048040,
alignment=<optimized out>, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4667

#4  clib_mem_heap_alloc_inline (heap=<optimized out>, size=72,
align=<optimized out>, os_out_of_memory_on_failure=1) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:608

#5  clib_mem_heap_alloc_aligned (heap=<optimized out>, size=72, align=8) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:664

#6  0x00007f5f6fba5157 in _vec_alloc_internal (n_elts=64, attr=<optimized
out>) at /home/pim/src/vpp/src/vppinfra/vec.c:35

#7  0x00007f5f6fb848c8 in _vec_resize (vp=<optimized out>, n_add=64,
hdr_sz=0, align=8, elt_sz=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/vec.h:256

#8  serialize_vector_write (m=<optimized out>, s=0x7f5f0dbfebc0) at
/home/pim/src/vpp/src/vppinfra/serialize.c:908

#9  0x00007f5f6fb843c1 in serialize_write_not_inline (m=0x7f5f0dbfeb60,
s=<optimized out>, n_bytes_to_write=4, flags=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/serialize.c:734

#10 0x00007f5f6fe5a053 in serialize_stream_read_write
(header=0x7f5f0dbfeb60, s=<optimized out>, n_bytes=4, flags=2) at
/home/pim/src/vpp/src/vppinfra/serialize.h:140

#11 serialize_get (m=0x7f5f0dbfeb60, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:180

#12 serialize_integer (m=0x7f5f0dbfeb60, x=<optimized out>, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:187

#13 vl_api_serialize_message_table (am=0x7f5f6fe66258 <api_global_main>,
vector=<optimized out>) at /home/pim/src/vpp/src/vlibapi/api_shared.c:210

#14 0x00007f5f6fe5a715 in vl_msg_api_trace_save (am=0x130048040,
which=<optimized out>, fp=0x13f0690, is_json=27 '\033') at
/home/pim/src/vpp/src/vlibapi/api_shared.c:410

#15 0x00007f5f6fe5c0ea in vl_msg_api_post_mortem_dump () at
/home/pim/src/vpp/src/vlibapi/api_shared.c:880

#16 0x00000000004068c6 in os_panic () at
/home/pim/src/vpp/src/vpp/vnet/main.c:415

#17 0x00007f5f6fb3feed in mspace_free (msp=0x130048040, mem=<optimized
out>) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:2954

#18 0x00007f5f6fb6bf8c in clib_mem_heap_free (heap=0x0, p=<optimized out>)
at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:768

#19 clib_mem_free (p=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:774

#20 0x00007f5f2fa32b40 in ?? ()

#21 0x00007f5f3302f848 in ?? ()

#22 0x0000000000000000 in ?? ()


When I kill VPP, sometimes an api_post_mortem is emitted (although most of
the time they are empty), but subsequently trying to dump it, makes VPP
crash -

-rw------- 1 ipng ipng 35437 Jan  8 19:08 api_post_mortem.76724

-rw------- 1 ipng ipng 35368 Jan  8 19:08 api_post_mortem.76842

-rw------- 1 ipng ipng     0 Jan  8 19:08 api_post_mortem.76978

-rw------- 1 ipng ipng     0 Jan  8 19:08 api_post_mortem.84008


#0  0x0000000000000000 in ?? ()

#1  0x00007ffff7fada5f in vl_msg_print_trace (msg=0x7fff9db73bd8 "",
ctx=0x7fff53b62ca0)
at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:693

#2  0x00007ffff66a55bb in vl_msg_traverse_trace (tp=0x7fff9b4e7998,
fn=0x7ffff7fad790
<vl_msg_print_trace>, ctx=0x7fff53b62ca0)

    at /home/pim/src/vpp/src/vlibapi/api_shared.c:321

#3  0x00007ffff7fab854 in api_trace_command_fn (vm=0x7fff96000700,
input=0x7fff53b62f30,
cmd=<optimized out>)

    at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:727

#4  0x00007ffff647fdad in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=<optimized out>, input=0x7fff53b62f30,

    parent_command_index=<optimized out>) at
/home/pim/src/vpp/src/vlib/cli.c:650

#5  0x00007ffff647fb91 in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=<optimized out>, input=0x7fff53b62f30,

    parent_command_index=<optimized out>) at
/home/pim/src/vpp/src/vlib/cli.c:607

#6  0x00007ffff647f0cd in vlib_cli_input (vm=0x7fff96000700,
input=0x7fff53b62f30,
function=<optimized out>, function_arg=<optimized out>)

    at /home/pim/src/vpp/src/vlib/cli.c:753

#7  0x00007ffff64fd5c7 in unix_cli_process_input (cm=<optimized out>,
cli_file_index=0) at /home/pim/src/vpp/src/vlib/unix/cli.c:2616

#8  unix_cli_process (vm=<optimized out>, rt=0x7fff9b69bdc0, f=<optimized
out>) at /home/pim/src/vpp/src/vlib/unix/cli.c:2745

#9  0x00007ffff64a7837 in vlib_process_bootstrap (_a=<optimized out>) at
/home/pim/src/vpp/src/vlib/main.c:1221

#10 0x00007ffff63f9d94 in clib_calljmp () at
/home/pim/src/vpp/src/vppinfra/longjmp.S:123

#11 0x00007fff94700b00 in ?? ()

#12 0x00007ffff649f3d0 in vlib_process_startup (vm=0x7fff96000700,
p=0x7fff9b69bdc0,
f=0x0) at /home/pim/src/vpp/src/vlib/main.c:1246

#13 dispatch_process (vm=0x7fff96000700, p=0x7fff9b69bdc0, f=0x0,
last_time_stamp=<optimized out>) at /home/pim/src/vpp/src/vlib/main.c:1302

#14 0x0000000000000000 in ?? ()

Has anybody else seen API calls seemingly hang the VPP instance? Is there
an alternative way to pry loose the information in api_post_mortem.* files
? Or any other clues where to narrow down the issue?
It's a rare issue (running a dozen or so instances with 6mo+ of uptime, and
one of them had this hang/crash a few times in a row this week).

-- 
Pim van Pelt <p...@ipng.nl>
PBVP1-RIPE - http://www.ipng.nl/
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22432): https://lists.fd.io/g/vpp-dev/message/22432
Mute This Topic: https://lists.fd.io/mt/96136395/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to