Hi Matt, 

No worries. I asked because, as luck would have it, quic does use the crypto 
infra :-)

Cheers, 
Florin

> On May 27, 2021, at 6:02 AM, Matthew Smith <[email protected]> wrote:
> 
> Hi Florin!
> 
> It appears that the quic plugin is disabled in my build:
> 
> 2021/05/27 07:44:49:044 notice     plugin/load    Plugin disabled (default): 
> quic_plugin.so
> 
> I didn't mean to give the impression that I thought this issue was caused by 
> quic. My mention of the quic commit was just intended to indicate how up to 
> date my build is with the gerrit master branch in case there were 
> recent/pending patches that people know of that might be relevant. That quic 
> commit is from about 2 weeks ago, which is the last time I merged upstream 
> changes.
> 
> Thanks,
> -Matt
> 
> 
> On Wed, May 26, 2021 at 5:58 PM Florin Coras <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Matt, 
> 
> Did you try checking if quic plugin is loaded, just to see if there’s a 
> connection there. 
> 
> Regards,
> Florin
> 
> > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io 
> > <http://lists.fd.io/> <[email protected] 
> > <mailto:[email protected]>> wrote:
> > 
> > Hi,
> > 
> > I saw VPP crash several times during some tests that were running to 
> > evaluate IPsec performance. The last upstream commit on my build of VPP is 
> > 'fd77f8c00 quic: remove cmake --target'. The tests ran on a C3000 with an 
> > onboard QAT. The tests were repeated with the QAT removed from the device 
> > whitelist in startup.conf (using async crypto with sw_scheduler) and the 
> > same thing happened.
> > 
> > The relevant part of the stack trace looks like this:
> > 
> > #8  0x00007fdbb4006459 in os_out_of_memory () at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221
> > #9  0x00007fdbb400d1fb in clib_mem_alloc_aligned_at_offset 
> > (size=2305843009213692256, align=8, align_offset=8, 
> > os_out_of_memory_on_failure=1) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243
> > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0, 
> > length_increment=288230376151711515, data_bytes=2305843009213692256, 
> > header_bytes=8, data_align=8, numa_id=255) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111
> > #11 0x00007fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0, 
> > length_increment=288230376151711515, data_bytes=2305843009213692248, 
> > header_bytes=0, data_align=8, numa_id=255) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170
> > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) 
> > at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> > ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 <cryptodev_raw_dequeue>, n_cache=1, 
> > n_total=0x7fdb145053dc) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135
> > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280, 
> > frame=0x0) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166
> > #16 0x00007fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80, 
> > node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT, 
> > dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, 
> > last_time_stamp=207016971809128) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024
> > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618
> > 
> > In vnet_crypto_async_free_frame() it appears that a call to pool_put() is 
> > trying to return a pointer to a pool that it is not a member of:
> > 
> > (gdb) frame 13
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80, frame=0x7fdb3461c280) 
> > at 
> > /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > 585  pool_put (ct->frame_pool, frame);
> > (gdb) p frame - ct->frame_pool
> > $1 = -13689
> > 
> > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored by 
> > the crypto engine and before it could be dequeued the pool filled and had 
> > to be reallocated. The per-thread frame_pool's are allocated with room for 
> > 1024 entries initially and ct->frame_pool had a vector length of 1025 when 
> > the crash occurred.
> > 
> > Can anyone with knowledge of the async crypto code confirm or refute that 
> > theory? Anyone have suggestions on the best way to fix this?
> > 
> > Thanks,
> > -Matt
> > 
> > 
> > 
> > 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19491): https://lists.fd.io/g/vpp-dev/message/19491
Mute This Topic: https://lists.fd.io/mt/83112898/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to