Hi Matt, No worries. I asked because, as luck would have it, quic does use the crypto infra :-)
Cheers, Florin > On May 27, 2021, at 6:02 AM, Matthew Smith <[email protected]> wrote: > > Hi Florin! > > It appears that the quic plugin is disabled in my build: > > 2021/05/27 07:44:49:044 notice plugin/load Plugin disabled (default): > quic_plugin.so > > I didn't mean to give the impression that I thought this issue was caused by > quic. My mention of the quic commit was just intended to indicate how up to > date my build is with the gerrit master branch in case there were > recent/pending patches that people know of that might be relevant. That quic > commit is from about 2 weeks ago, which is the last time I merged upstream > changes. > > Thanks, > -Matt > > > On Wed, May 26, 2021 at 5:58 PM Florin Coras <[email protected] > <mailto:[email protected]>> wrote: > Hi Matt, > > Did you try checking if quic plugin is loaded, just to see if there’s a > connection there. > > Regards, > Florin > > > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io > > <http://lists.fd.io/> <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi, > > > > I saw VPP crash several times during some tests that were running to > > evaluate IPsec performance. The last upstream commit on my build of VPP is > > 'fd77f8c00 quic: remove cmake --target'. The tests ran on a C3000 with an > > onboard QAT. The tests were repeated with the QAT removed from the device > > whitelist in startup.conf (using async crypto with sw_scheduler) and the > > same thing happened. > > > > The relevant part of the stack trace looks like this: > > > > #8 0x00007fdbb4006459 in os_out_of_memory () at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221 > > #9 0x00007fdbb400d1fb in clib_mem_alloc_aligned_at_offset > > (size=2305843009213692256, align=8, align_offset=8, > > os_out_of_memory_on_failure=1) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243 > > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0, > > length_increment=288230376151711515, data_bytes=2305843009213692256, > > header_bytes=8, data_align=8, numa_id=255) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111 > > #11 0x00007fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0, > > length_increment=288230376151711515, data_bytes=2305843009213692248, > > header_bytes=0, data_align=8, numa_id=255) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170 > > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643 > > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) > > at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585 > > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, > > ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 <cryptodev_raw_dequeue>, n_cache=1, > > n_total=0x7fdb145053dc) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135 > > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, > > frame=0x0) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166 > > #16 0x00007fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80, > > node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT, > > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, > > last_time_stamp=207016971809128) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024 > > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618 > > > > In vnet_crypto_async_free_frame() it appears that a call to pool_put() is > > trying to return a pointer to a pool that it is not a member of: > > > > (gdb) frame 13 > > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) > > at > > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585 > > 585 pool_put (ct->frame_pool, frame); > > (gdb) p frame - ct->frame_pool > > $1 = -13689 > > > > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored by > > the crypto engine and before it could be dequeued the pool filled and had > > to be reallocated. The per-thread frame_pool's are allocated with room for > > 1024 entries initially and ct->frame_pool had a vector length of 1025 when > > the crash occurred. > > > > Can anyone with knowledge of the async crypto code confirm or refute that > > theory? Anyone have suggestions on the best way to fix this? > > > > Thanks, > > -Matt > > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19491): https://lists.fd.io/g/vpp-dev/message/19491 Mute This Topic: https://lists.fd.io/mt/83112898/21656 Group Owner: [email protected] Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
