Hi Matt, 

Did you try checking if quic plugin is loaded, just to see if there’s a 
connection there. 

Regards,
Florin

> On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io 
> <mgsmith=netgate....@lists.fd.io> wrote:
> 
> Hi,
> 
> I saw VPP crash several times during some tests that were running to evaluate 
> IPsec performance. The last upstream commit on my build of VPP is 'fd77f8c00 
> quic: remove cmake --target'. The tests ran on a C3000 with an onboard QAT. 
> The tests were repeated with the QAT removed from the device whitelist in 
> startup.conf (using async crypto with sw_scheduler) and the same thing 
> happened.
> 
> The relevant part of the stack trace looks like this:
> 
> #8  0x00007fdbb4006459 in os_out_of_memory () at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221
> #9  0x00007fdbb400d1fb in clib_mem_alloc_aligned_at_offset 
> (size=2305843009213692256, align=8, align_offset=8, 
> os_out_of_memory_on_failure=1) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243
> #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0, 
> length_increment=288230376151711515, data_bytes=2305843009213692256, 
> header_bytes=8, data_align=8, numa_id=255) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111
> #11 0x00007fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0, 
> length_increment=288230376151711515, data_bytes=2305843009213692248, 
> header_bytes=0, data_align=8, numa_id=255) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170
> #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643
> #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 <cryptodev_raw_dequeue>, n_cache=1, 
> n_total=0x7fdb145053dc) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135
> #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> frame=0x0) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166
> #16 0x00007fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80, 
> node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT, 
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
> last_time_stamp=207016971809128) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024
> #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618
> 
> In vnet_crypto_async_free_frame() it appears that a call to pool_put() is 
> trying to return a pointer to a pool that it is not a member of:
> 
> (gdb) frame 13
> #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) at 
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> 585  pool_put (ct->frame_pool, frame);
> (gdb) p frame - ct->frame_pool
> $1 = -13689
> 
> It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored by 
> the crypto engine and before it could be dequeued the pool filled and had to 
> be reallocated. The per-thread frame_pool's are allocated with room for 1024 
> entries initially and ct->frame_pool had a vector length of 1025 when the 
> crash occurred.
> 
> Can anyone with knowledge of the async crypto code confirm or refute that 
> theory? Anyone have suggestions on the best way to fix this?
> 
> Thanks,
> -Matt
> 
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19480): https://lists.fd.io/g/vpp-dev/message/19480
Mute This Topic: https://lists.fd.io/mt/83112898/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to