Dear Rubina,

Thanks for catching and reporting this!

I suspect what might be happening is my recent change of using two
unidirectional sessions in bihash vs. the single one triggered a race,
whereby as the owning worker is deleting the session,
the non-owning worker is trying to update it. That would logically
explain the "BUG: .." line (since you don't change the interfaces nor
moving the traffic around, the 5 tuples should not collide), and as
well the later stop.

To take care of this issue, I think I will split the deletion of the
session in two stages:
1) deactivation of the bihash entries that steer the traffic
2) freeing up the per-worker session structure

and have a little pause time inbetween these two so that the
workers-in-progress could
finish updating the structures.

The below gerrit is the first cut:

https://gerrit.fd.io/r/#/c/12770/

It passes the make test right now but I did not kick its tires too
much yet, will do tomorrow.

You can try this change out in your test setup as well and tell me how it feels.

--a

On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
> Hi
>
> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl
> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will
> become 0 after a short while, about 300 sec in my machine. Here is vpp
> status:
>
> root@MYRB:~# service vpp status
> * vpp.service - vector packet processing engine
>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset:
> enabled)
>    Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; 37s
> ago
>   Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm
> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>   Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf
> (code=killed, signal=ABRT)
>   Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic (code=exited,
> status=0/SUCCESS)
>   Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm
> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>  Main PID: 31754 (code=killed, signal=ABRT)
>
> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session
> LSB16(sw_if_index) and 5-tuple collision!
> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal
> SIGCONT, PC 0x7f1fb591cac0
> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting...
> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing
> engine...
> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal
> SIGTERM, PC 0x7f1fb3c40867
> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: worker
> thread deadlock
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited,
> code=killed, status=6/ABRT
> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine.
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state.
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result 'signal'.
>
> I attach my vpp configs to this email. I also run this test with the same
> config and added 4 interface instead of two. But in this case nothing
> happened to vpp and it was functional for a long time.
>
> Thanks,
> RB
>

-=-=-=-=-=-=-=-=-=-=-=-
Links:

You receive all messages sent to this group.

View/Reply Online (#9433): https://lists.fd.io/g/vpp-dev/message/9433
View All Messages In Topic (2): https://lists.fd.io/g/vpp-dev/topic/20397310
Mute This Topic: https://lists.fd.io/mt/20397310/21656
New Topic: https://lists.fd.io/g/vpp-dev/post

Change Your Subscription: https://lists.fd.io/g/vpp-dev/editsub/21656
Group Home: https://lists.fd.io/g/vpp-dev
Contact Group Owner: vpp-dev+ow...@lists.fd.io
Terms of Service: https://lists.fd.io/static/tos
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to