Dear Andrew I cleaned everything and created a new deb packages by your patch once again. With your patch I never see deadlock again, but still I have throughput problem in my scenario.
-Per port stats table ports | 0 | 1 ----------------------------------------------------------------------------------------- opackets | 474826597 | 452028770 obytes | 207843848531 | 199591809555 ipackets | 71010677 | 72028456 ibytes | 31441646551 | 31687562468 ierrors | 0 | 0 oerrors | 0 | 0 Tx Bw | 9.56 Gbps | 9.16 Gbps -Global stats enabled Cpu Utilization : 88.4 % 7.1 Gb/core Platform_factor : 1.0 Total-Tx : 18.72 Gbps Total-Rx : 59.30 Mbps Total-PPS : 5.31 Mpps Total-CPS : 79.79 Kcps Expected-PPS : 9.02 Mpps Expected-CPS : 135.31 Kcps Expected-BPS : 31.77 Gbps Active-flows : 88837 Clients : 252 Socket-util : 0.5598 % Open-flows : 14708455 Servers : 65532 Socket : 88837 Socket/Clients : 352.5 Total_queue_full : 328355248 drop-rate : 18.66 Gbps current time : 180.9 sec test duration : 99819.1 sec In best case (4 interface in one numa that only 2 of them has acl) my device (HP DL380 G9) throughput is maximum (18.72Gbps) but in worst case (4 interface in one numa that all of them has acl) my device throughput will decrease from maximum to around 60Mbps. Actually patch just prevent deadlock in my case but throughput is same as before. ________________________________ From: Andrew 👽 Yourtchenko <ayour...@gmail.com> Sent: Tuesday, May 29, 2018 10:11 AM To: Rubina Bianchi Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Rx stuck to 0 after a while Dear Rubina, thank you for quickly checking it! Judging by the logs the VPP quits, so I would say there should be a core file, could you check ? If you find it (doublecheck by the timestamps that it is indeed the fresh one), you can load it in gdb (using gdb 'path-to-vpp-binary' 'path-to-core') and then get the backtrace using 'bt', this will give more idea on what is going on. --a On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote: > Dear Andrew > > I tested your patch and my problem still exist, but my service status > changed and now there isn't any information about deadlock problem. Do you > have any idea about how I can provide you more information? > > root@MYRB:~# service vpp status > * vpp.service - vector packet processing engine > Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset: > enabled) > Active: inactive (dead) > > May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded > plugin: udp_ping_test_plugin.so > May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded > plugin: stn_test_plugin.so > May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init args: > -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 > -w 0000:08:00.1 -w 0000:08 > May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n 4 > --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w > 0000:08:00.1 -w 0000:08:00.2 -w 000 > May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough DPDK > crypto resources, default to OpenSSL > May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal > SIGCONT, PC 0x7fa535dfbac0 > May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting... > May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing > engine... > May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal > SIGTERM, PC 0x7fa534121867 > May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing engine. > > > ________________________________ > From: Andrew 👽 Yourtchenko <ayour...@gmail.com> > Sent: Monday, May 28, 2018 5:58 PM > To: Rubina Bianchi > Cc: vpp-dev@lists.fd.io > Subject: Re: [vpp-dev] Rx stuck to 0 after a while > > Dear Rubina, > > Thanks for catching and reporting this! > > I suspect what might be happening is my recent change of using two > unidirectional sessions in bihash vs. the single one triggered a race, > whereby as the owning worker is deleting the session, > the non-owning worker is trying to update it. That would logically > explain the "BUG: .." line (since you don't change the interfaces nor > moving the traffic around, the 5 tuples should not collide), and as > well the later stop. > > To take care of this issue, I think I will split the deletion of the > session in two stages: > 1) deactivation of the bihash entries that steer the traffic > 2) freeing up the per-worker session structure > > and have a little pause time inbetween these two so that the > workers-in-progress could > finish updating the structures. > > The below gerrit is the first cut: > > https://gerrit.fd.io/r/#/c/12770/ > > It passes the make test right now but I did not kick its tires too > much yet, will do tomorrow. > > You can try this change out in your test setup as well and tell me how it > feels. > > --a > > On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote: >> Hi >> >> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl >> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will >> become 0 after a short while, about 300 sec in my machine. Here is vpp >> status: >> >> root@MYRB:~# service vpp status >> * vpp.service - vector packet processing engine >> Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor >> preset: >> enabled) >> Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; >> 37s >> ago >> Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm >> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >> Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf >> (code=killed, signal=ABRT) >> Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic >> (code=exited, >> status=0/SUCCESS) >> Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm >> /dev/shm/vpe-api (code=exited, status=0/SUCCESS) >> Main PID: 31754 (code=killed, signal=ABRT) >> >> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session >> LSB16(sw_if_index) and 5-tuple collision! >> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >> signal >> SIGCONT, PC 0x7f1fb591cac0 >> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting... >> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing >> engine... >> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received >> signal >> SIGTERM, PC 0x7f1fb3c40867 >> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: >> worker >> thread deadlock >> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited, >> code=killed, status=6/ABRT >> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine. >> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state. >> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result >> 'signal'. >> >> I attach my vpp configs to this email. I also run this test with the same >> config and added 4 interface instead of two. But in this case nothing >> happened to vpp and it was functional for a long time. >> >> Thanks, >> RB >> >