Hoi, I've had a few instances of a recent VPP hanging - API and CLI go unresponsive, forwarding stops (at least, I think), but the worker threads are still consuming CPU. Attaching GDB, I see the main thread is doing the following:
(gdb) bt #0 0x00007f5f6f8f271b in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007f5f6fb3df8b in spin_acquire_lock (sl=<optimized out>) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:468 #2 mspace_malloc (msp=0x130048040, bytes=72) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:4351 #3 0x00007f5f6fb66f81 in mspace_memalign (msp=0x130048040, alignment=<optimized out>, bytes=72) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:4667 #4 clib_mem_heap_alloc_inline (heap=<optimized out>, size=72, align=<optimized out>, os_out_of_memory_on_failure=1) at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:608 #5 clib_mem_heap_alloc_aligned (heap=<optimized out>, size=72, align=8) at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:664 #6 0x00007f5f6fba5157 in _vec_alloc_internal (n_elts=64, attr=<optimized out>) at /home/pim/src/vpp/src/vppinfra/vec.c:35 #7 0x00007f5f6fb848c8 in _vec_resize (vp=<optimized out>, n_add=64, hdr_sz=0, align=8, elt_sz=<optimized out>) at /home/pim/src/vpp/src/vppinfra/vec.h:256 #8 serialize_vector_write (m=<optimized out>, s=0x7f5f0dbfebc0) at /home/pim/src/vpp/src/vppinfra/serialize.c:908 #9 0x00007f5f6fb843c1 in serialize_write_not_inline (m=0x7f5f0dbfeb60, s=<optimized out>, n_bytes_to_write=4, flags=<optimized out>) at /home/pim/src/vpp/src/vppinfra/serialize.c:734 #10 0x00007f5f6fe5a053 in serialize_stream_read_write (header=0x7f5f0dbfeb60, s=<optimized out>, n_bytes=4, flags=2) at /home/pim/src/vpp/src/vppinfra/serialize.h:140 #11 serialize_get (m=0x7f5f0dbfeb60, n_bytes=4) at /home/pim/src/vpp/src/vppinfra/serialize.h:180 #12 serialize_integer (m=0x7f5f0dbfeb60, x=<optimized out>, n_bytes=4) at /home/pim/src/vpp/src/vppinfra/serialize.h:187 #13 vl_api_serialize_message_table (am=0x7f5f6fe66258 <api_global_main>, vector=<optimized out>) at /home/pim/src/vpp/src/vlibapi/api_shared.c:210 #14 0x00007f5f6fe5a715 in vl_msg_api_trace_save (am=0x130048040, which=<optimized out>, fp=0x13f0690, is_json=27 '\033') at /home/pim/src/vpp/src/vlibapi/api_shared.c:410 #15 0x00007f5f6fe5c0ea in vl_msg_api_post_mortem_dump () at /home/pim/src/vpp/src/vlibapi/api_shared.c:880 #16 0x00000000004068c6 in os_panic () at /home/pim/src/vpp/src/vpp/vnet/main.c:415 #17 0x00007f5f6fb3feed in mspace_free (msp=0x130048040, mem=<optimized out>) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:2954 #18 0x00007f5f6fb6bf8c in clib_mem_heap_free (heap=0x0, p=<optimized out>) at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:768 #19 clib_mem_free (p=<optimized out>) at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:774 #20 0x00007f5f2fa32b40 in ?? () #21 0x00007f5f3302f848 in ?? () #22 0x0000000000000000 in ?? () When I kill VPP, sometimes an api_post_mortem is emitted (although most of the time they are empty), but subsequently trying to dump it, makes VPP crash - -rw------- 1 ipng ipng 35437 Jan 8 19:08 api_post_mortem.76724 -rw------- 1 ipng ipng 35368 Jan 8 19:08 api_post_mortem.76842 -rw------- 1 ipng ipng 0 Jan 8 19:08 api_post_mortem.76978 -rw------- 1 ipng ipng 0 Jan 8 19:08 api_post_mortem.84008 #0 0x0000000000000000 in ?? () #1 0x00007ffff7fada5f in vl_msg_print_trace (msg=0x7fff9db73bd8 "", ctx=0x7fff53b62ca0) at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:693 #2 0x00007ffff66a55bb in vl_msg_traverse_trace (tp=0x7fff9b4e7998, fn=0x7ffff7fad790 <vl_msg_print_trace>, ctx=0x7fff53b62ca0) at /home/pim/src/vpp/src/vlibapi/api_shared.c:321 #3 0x00007ffff7fab854 in api_trace_command_fn (vm=0x7fff96000700, input=0x7fff53b62f30, cmd=<optimized out>) at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:727 #4 0x00007ffff647fdad in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700, cm=<optimized out>, input=0x7fff53b62f30, parent_command_index=<optimized out>) at /home/pim/src/vpp/src/vlib/cli.c:650 #5 0x00007ffff647fb91 in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700, cm=<optimized out>, input=0x7fff53b62f30, parent_command_index=<optimized out>) at /home/pim/src/vpp/src/vlib/cli.c:607 #6 0x00007ffff647f0cd in vlib_cli_input (vm=0x7fff96000700, input=0x7fff53b62f30, function=<optimized out>, function_arg=<optimized out>) at /home/pim/src/vpp/src/vlib/cli.c:753 #7 0x00007ffff64fd5c7 in unix_cli_process_input (cm=<optimized out>, cli_file_index=0) at /home/pim/src/vpp/src/vlib/unix/cli.c:2616 #8 unix_cli_process (vm=<optimized out>, rt=0x7fff9b69bdc0, f=<optimized out>) at /home/pim/src/vpp/src/vlib/unix/cli.c:2745 #9 0x00007ffff64a7837 in vlib_process_bootstrap (_a=<optimized out>) at /home/pim/src/vpp/src/vlib/main.c:1221 #10 0x00007ffff63f9d94 in clib_calljmp () at /home/pim/src/vpp/src/vppinfra/longjmp.S:123 #11 0x00007fff94700b00 in ?? () #12 0x00007ffff649f3d0 in vlib_process_startup (vm=0x7fff96000700, p=0x7fff9b69bdc0, f=0x0) at /home/pim/src/vpp/src/vlib/main.c:1246 #13 dispatch_process (vm=0x7fff96000700, p=0x7fff9b69bdc0, f=0x0, last_time_stamp=<optimized out>) at /home/pim/src/vpp/src/vlib/main.c:1302 #14 0x0000000000000000 in ?? () Has anybody else seen API calls seemingly hang the VPP instance? Is there an alternative way to pry loose the information in api_post_mortem.* files ? Or any other clues where to narrow down the issue? It's a rare issue (running a dozen or so instances with 6mo+ of uptime, and one of them had this hang/crash a few times in a row this week). -- Pim van Pelt <p...@ipng.nl> PBVP1-RIPE - http://www.ipng.nl/
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22432): https://lists.fd.io/g/vpp-dev/message/22432 Mute This Topic: https://lists.fd.io/mt/96136395/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-