Hi Matt, Did you try checking if quic plugin is loaded, just to see if there’s a connection there.
Regards, Florin > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io > <mgsmith=netgate....@lists.fd.io> wrote: > > Hi, > > I saw VPP crash several times during some tests that were running to evaluate > IPsec performance. The last upstream commit on my build of VPP is 'fd77f8c00 > quic: remove cmake --target'. The tests ran on a C3000 with an onboard QAT. > The tests were repeated with the QAT removed from the device whitelist in > startup.conf (using async crypto with sw_scheduler) and the same thing > happened. > > The relevant part of the stack trace looks like this: > > #8 0x00007fdbb4006459 in os_out_of_memory () at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221 > #9 0x00007fdbb400d1fb in clib_mem_alloc_aligned_at_offset > (size=2305843009213692256, align=8, align_offset=8, > os_out_of_memory_on_failure=1) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243 > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0, > length_increment=288230376151711515, data_bytes=2305843009213692256, > header_bytes=8, data_align=8, numa_id=255) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111 > #11 0x00007fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0, > length_increment=288230376151711515, data_bytes=2305843009213692248, > header_bytes=0, data_align=8, numa_id=255) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170 > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643 > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585 > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, > ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 <cryptodev_raw_dequeue>, n_cache=1, > n_total=0x7fdb145053dc) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135 > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, > frame=0x0) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166 > #16 0x00007fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80, > node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT, > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, > last_time_stamp=207016971809128) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024 > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618 > > In vnet_crypto_async_free_frame() it appears that a call to pool_put() is > trying to return a pointer to a pool that it is not a member of: > > (gdb) frame 13 > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) at > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585 > 585 pool_put (ct->frame_pool, frame); > (gdb) p frame - ct->frame_pool > $1 = -13689 > > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored by > the crypto engine and before it could be dequeued the pool filled and had to > be reallocated. The per-thread frame_pool's are allocated with room for 1024 > entries initially and ct->frame_pool had a vector length of 1025 when the > crash occurred. > > Can anyone with knowledge of the async crypto code confirm or refute that > theory? Anyone have suggestions on the best way to fix this? > > Thanks, > -Matt > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19480): https://lists.fd.io/g/vpp-dev/message/19480 Mute This Topic: https://lists.fd.io/mt/83112898/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-