Hi Mercury, That indeed looks like an issue and given the context, it looks like it might be related to vls/ldp session sharing code. The heap could also leak but that’s somewhat less likely to happen. To confirm, after a crash in gdb could you: p vl(vcm->workers[0].sessions).
That should tell us the size of the underlying pool. And yes, the svm infra (fifo segment, fifos, mqs) are separate from the heap which is shared by all workers, even if different processes. I’m assuming here you’re using nginx as a web server, i.e., it only accepts connections and once it’s done with them it closes. If keepalive_timeout is set to a large value (it was 300 in the jira ticket configs), could you reduce it to at most 10s? Probably something similar should be done for keepalive_requests. Regards, Florin > On Nov 30, 2021, at 10:24 PM, mercury noah <mercury124...@gmail.com> wrote: > > Hi Florin, > > I tested with relatively newer version of vpp and nginx , > after adjust the message queue num, vpp wont print "failed to alloc msg" > message, > but nginx will still crash, I tested maximum 4 nginx , > > below is my conf: > vpp# show version verbose cmdline > Version: v22.02-rc0~355-g376c2106c > Compiled by: root > Compile host: ubuntu > Compile date: 2021-12-01T02:02:14 > Compile location: /root/vpp_master > Compiler: Clang/LLVM 11.0.0 > Current PID: 97177 > Command line arguments: > /usr/bin/vpp > unix > { > log > /tmp/vpe.log > cli-listen > /run/vpp/cli.sock > nodaemon > interactive > full-coredump > } > cpu > { > main-core > 1 > corelist-workers > 2 > } > dpdk > { > uio-driver > vfio-pci > dev > 0000:02:00.2 > } > session > { > event-queue-length > 100000 > } > > cat $CONFIG_ROOT/vcl_iperf3.conf > vcl { > heapsize 256M > rx-fifo-size 4000000 > tx-fifo-size 4000000 > # app-scope-local > # app-scope-global > api-socket-name /run/vpp/api.sock > segment-size 8589934592 > add-segment-size 8589934592 > event-queue-size 500000 > } > > below is some test data, > adjust vcl heapsize conf can influence the result, > the number of nginx worker wont influece, > with the same heapsize, success connection num are almost the same, > 64M: 1.1M, 128M: 2.3M, 256M: 4.8M > > mode: nginx master + nginx worker(master_process on) > | heapsize | nginx worker num | success connection num | > | :------: | :--------------: | :--------------------: | > | 64M | 1 | 1100925 | > | 64M | 2 | 1100296 | > | 64M | 4 | 1083280 | > | 128M | 1 | 2341828 | > | 128M | 2 | 2377562 | > | 128M | 4 | 2359387 | > | 256M | 1 | 4869103 | > | 256M | 2 | 4908559 | > | 256M | 4 | 4895719 | > > mode: 1 nginx worker(master_process off) > nginx will not crash any more > > I think that segment and message queue may belong to svm, > but the memory leak is in the vcl heap, > vcl heap and svm are different part of memory, > some close messages may have lost, but success connection num may not be > almost the same, > so I guess reason for the issue is not this, > (In my understanding, maybe wrong, for I'm not familiar with the code, ) > > The strange thing is, with the same configuration, > nginx with master_process off mode will not crash(there is only one nginx). > It seems like a race condition, > but I looked into memory heap and found it have checked the lock > and vcl has initialize the lock, > check: PREACTION() > initialize: > vppcom_cfg_heapsize().clib_mem_init_thread_safe().clib_mem_init_internal() > .clib_mem_create_heap_internal().create_mspace_with_base() > .set_lock() > Also, if it is a memory lock issue, the crash wont be so regular, it will > crash at any time, > > I doubt the issue is in the vcl end perhaps but dont know where it is till > now, > > Regards, > Mercury > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20569): https://lists.fd.io/g/vpp-dev/message/20569 Mute This Topic: https://lists.fd.io/mt/87398061/21656 Mute #vpp-hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-hoststack Mute #hoststack:https://lists.fd.io/g/vpp-dev/mutehashtag/hoststack Mute #vppcom:https://lists.fd.io/g/vpp-dev/mutehashtag/vppcom Mute #nginx:https://lists.fd.io/g/vpp-dev/mutehashtag/nginx Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-