Dear Andrew

I tested your patch and my problem still exist, but my service status changed 
and now there isn't any information about deadlock problem. Do you have any 
idea about how I can provide you more information?

root@MYRB:~# service vpp status
* vpp.service - vector packet processing engine
   Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset: 
enabled)
   Active: inactive (dead)

May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded 
plugin: udp_ping_test_plugin.so
May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded 
plugin: stn_test_plugin.so
May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init args: -c 
1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w 
0000:08:00.1 -w 0000:08
May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n 4 
--huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w 0000:08:00.1 
-w 0000:08:00.2 -w 000
May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough DPDK 
crypto resources, default to OpenSSL
May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal 
SIGCONT, PC 0x7fa535dfbac0
May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting...
May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing engine...
May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal 
SIGTERM, PC 0x7fa534121867
May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing engine.


________________________________
From: Andrew 👽 Yourtchenko <ayour...@gmail.com>
Sent: Monday, May 28, 2018 5:58 PM
To: Rubina Bianchi
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Rx stuck to 0 after a while

Dear Rubina,

Thanks for catching and reporting this!

I suspect what might be happening is my recent change of using two
unidirectional sessions in bihash vs. the single one triggered a race,
whereby as the owning worker is deleting the session,
the non-owning worker is trying to update it. That would logically
explain the "BUG: .." line (since you don't change the interfaces nor
moving the traffic around, the 5 tuples should not collide), and as
well the later stop.

To take care of this issue, I think I will split the deletion of the
session in two stages:
1) deactivation of the bihash entries that steer the traffic
2) freeing up the per-worker session structure

and have a little pause time inbetween these two so that the
workers-in-progress could
finish updating the structures.

The below gerrit is the first cut:

https://gerrit.fd.io/r/#/c/12770/

It passes the make test right now but I did not kick its tires too
much yet, will do tomorrow.

You can try this change out in your test setup as well and tell me how it feels.

--a

On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
> Hi
>
> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl
> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will
> become 0 after a short while, about 300 sec in my machine. Here is vpp
> status:
>
> root@MYRB:~# service vpp status
> * vpp.service - vector packet processing engine
>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset:
> enabled)
>    Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; 37s
> ago
>   Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm
> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>   Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf
> (code=killed, signal=ABRT)
>   Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic (code=exited,
> status=0/SUCCESS)
>   Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm
> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>  Main PID: 31754 (code=killed, signal=ABRT)
>
> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session
> LSB16(sw_if_index) and 5-tuple collision!
> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal
> SIGCONT, PC 0x7f1fb591cac0
> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting...
> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing
> engine...
> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal
> SIGTERM, PC 0x7f1fb3c40867
> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: worker
> thread deadlock
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited,
> code=killed, status=6/ABRT
> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine.
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state.
> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result 'signal'.
>
> I attach my vpp configs to this email. I also run this test with the same
> config and added 4 interface instead of two. But in this case nothing
> happened to vpp and it was functional for a long time.
>
> Thanks,
> RB
>

Reply via email to