Dear Rubina, Thanks for catching and reporting this!
I suspect what might be happening is my recent change of using two unidirectional sessions in bihash vs. the single one triggered a race, whereby as the owning worker is deleting the session, the non-owning worker is trying to update it. That would logically explain the "BUG: .." line (since you don't change the interfaces nor moving the traffic around, the 5 tuples should not collide), and as well the later stop. To take care of this issue, I think I will split the deletion of the session in two stages: 1) deactivation of the bihash entries that steer the traffic 2) freeing up the per-worker session structure and have a little pause time inbetween these two so that the workers-in-progress could finish updating the structures. The below gerrit is the first cut: https://gerrit.fd.io/r/#/c/12770/ It passes the make test right now but I did not kick its tires too much yet, will do tomorrow. You can try this change out in your test setup as well and tell me how it feels. --a On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote: > Hi > > I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl > (permit+reflect) and generated sfr traffic using trex v2.27. My rx will > become 0 after a short while, about 300 sec in my machine. Here is vpp > status: > > root@MYRB:~# service vpp status > * vpp.service - vector packet processing engine > Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset: > enabled) > Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; 37s > ago > Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm > /dev/shm/vpe-api (code=exited, status=0/SUCCESS) > Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf > (code=killed, signal=ABRT) > Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic (code=exited, > status=0/SUCCESS) > Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm > /dev/shm/vpe-api (code=exited, status=0/SUCCESS) > Main PID: 31754 (code=killed, signal=ABRT) > > May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session > LSB16(sw_if_index) and 5-tuple collision! > May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal > SIGCONT, PC 0x7f1fb591cac0 > May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting... > May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing > engine... > May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal > SIGTERM, PC 0x7f1fb3c40867 > May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: worker > thread deadlock > May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited, > code=killed, status=6/ABRT > May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine. > May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state. > May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result 'signal'. > > I attach my vpp configs to this email. I also run this test with the same > config and added 4 interface instead of two. But in this case nothing > happened to vpp and it was functional for a long time. > > Thanks, > RB > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#9433): https://lists.fd.io/g/vpp-dev/message/9433 View All Messages In Topic (2): https://lists.fd.io/g/vpp-dev/topic/20397310 Mute This Topic: https://lists.fd.io/mt/20397310/21656 New Topic: https://lists.fd.io/g/vpp-dev/post Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656 Group Home: https://lists.fd.io/g/vpp-dev Contact Group Owner: vpp-dev+ow...@lists.fd.io Terms of Service: https://lists.fd.io/static/tos Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub -=-=-=-=-=-=-=-=-=-=-=-