Hi Florin!

It appears that the quic plugin is disabled in my build:

2021/05/27 07:44:49:044 notice     plugin/load    Plugin disabled
(default): quic_plugin.so

I didn't mean to give the impression that I thought this issue was caused
by quic. My mention of the quic commit was just intended to indicate how up
to date my build is with the gerrit master branch in case there were
recent/pending patches that people know of that might be relevant. That
quic commit is from about 2 weeks ago, which is the last time I merged
upstream changes.

Thanks,
-Matt


On Wed, May 26, 2021 at 5:58 PM Florin Coras <[email protected]> wrote:

> Hi Matt,
>
> Did you try checking if quic plugin is loaded, just to see if there’s a
> connection there.
>
> Regards,
> Florin
>
> > On May 26, 2021, at 3:19 PM, Matthew Smith via lists.fd.io <mgsmith=
> [email protected]> wrote:
> >
> > Hi,
> >
> > I saw VPP crash several times during some tests that were running to
> evaluate IPsec performance. The last upstream commit on my build of VPP is
> 'fd77f8c00 quic: remove cmake --target'. The tests ran on a C3000 with an
> onboard QAT. The tests were repeated with the QAT removed from the device
> whitelist in startup.conf (using async crypto with sw_scheduler) and the
> same thing happened.
> >
> > The relevant part of the stack trace looks like this:
> >
> > #8  0x00007fdbb4006459 in os_out_of_memory () at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/unix-misc.c:221
> > #9  0x00007fdbb400d1fb in clib_mem_alloc_aligned_at_offset
> (size=2305843009213692256, align=8, align_offset=8,
> os_out_of_memory_on_failure=1) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/mem.h:243
> > #10 vec_resize_allocate_memory (v=0x7fdb36a9b7f0,
> length_increment=288230376151711515, data_bytes=2305843009213692256,
> header_bytes=8, data_align=8, numa_id=255) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.c:111
> > #11 0x00007fdbb60efe01 in _vec_resize_inline (v=0x7fdb36a9b7f0,
> length_increment=288230376151711515, data_bytes=2305843009213692248,
> header_bytes=0, data_align=8, numa_id=255) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/vec.h:170
> > #12 clib_bitmap_ori_notrim (ai=0x7fdb36a9b7f0, i=18446744073709537927)
> at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vppinfra/bitmap.h:643
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80,
> frame=0x7fdb3461c280) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > #14 crypto_dequeue_frame (vm=0x7fdb356f7a80, node=0x7fdb36bbd280,
> ct=0x7fdb33537f80, hdl=0x7fdb2bc32810 <cryptodev_raw_dequeue>, n_cache=1,
> n_total=0x7fdb145053dc) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:135
> > #15 crypto_dispatch_node_fn (vm=0x7fdb356f7a80, node=0x7fdb36bbd280,
> frame=0x0) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/node.c:166
> > #16 0x00007fdbb4b789e5 in dispatch_node (vm=0x7fdb356f7a80,
> node=0x7fdb36bbd280, type=VLIB_NODE_TYPE_INPUT,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
> last_time_stamp=207016971809128) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1024
> > #17 vlib_main_or_worker_loop (vm=0x7fdb356f7a80, is_main=0) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vlib/main.c:1618
> >
> > In vnet_crypto_async_free_frame() it appears that a call to pool_put()
> is trying to return a pointer to a pool that it is not a member of:
> >
> > (gdb) frame 13
> > #13 vnet_crypto_async_free_frame (vm=0x7fdb356f7a80,
> frame=0x7fdb3461c280) at
> /usr/src/debug/vpp-21.01-568~g67ff5da46.el8.x86_64/src/vnet/crypto/crypto.h:585
> > 585  pool_put (ct->frame_pool, frame);
> > (gdb) p frame - ct->frame_pool
> > $1 = -13689
> >
> > It seems like maybe a pointer to a vnet_crypto_async_frame_t was stored
> by the crypto engine and before it could be dequeued the pool filled and
> had to be reallocated. The per-thread frame_pool's are allocated with room
> for 1024 entries initially and ct->frame_pool had a vector length of 1025
> when the crash occurred.
> >
> > Can anyone with knowledge of the async crypto code confirm or refute
> that theory? Anyone have suggestions on the best way to fix this?
> >
> > Thanks,
> > -Matt
> >
> >
> > 
> >
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19490): https://lists.fd.io/g/vpp-dev/message/19490
Mute This Topic: https://lists.fd.io/mt/83112898/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to