Re: [vpp-dev] Rx stuck to 0 after a while

Rubina Bianchi Tue, 29 May 2018 04:43:09 -0700

Dear Andrew

I cleaned everything and created a new deb packages by your patch once again. 
With your patch I never see deadlock again, but still I have throughput problem 
in my scenario.


-Per port stats table
      ports |               0 |               1
 
-----------------------------------------------------------------------------------------
   opackets |       474826597 |       452028770
     obytes |    207843848531 |    199591809555
   ipackets |        71010677 |        72028456
     ibytes |     31441646551 |     31687562468
    ierrors |               0 |               0
    oerrors |               0 |               0
      Tx Bw |       9.56 Gbps |       9.16 Gbps

-Global stats enabled
 Cpu Utilization : 88.4  %  7.1 Gb/core
 Platform_factor : 1.0
 Total-Tx        :      18.72 Gbps
 Total-Rx        :      59.30 Mbps
 Total-PPS       :       5.31 Mpps
 Total-CPS       :      79.79 Kcps

 Expected-PPS    :       9.02 Mpps
 Expected-CPS    :     135.31 Kcps
 Expected-BPS    :      31.77 Gbps

 Active-flows    :    88837  Clients :      252   Socket-util : 0.5598 %
 Open-flows      : 14708455  Servers :    65532   Socket :    88837 
Socket/Clients :  352.5
 Total_queue_full : 328355248
 drop-rate       :      18.66 Gbps
 current time    : 180.9 sec
 test duration   : 99819.1 sec

In best case (4 interface in one numa that only 2 of them has acl) my device 
(HP DL380 G9) throughput is maximum (18.72Gbps) but in worst case (4 interface 
in one numa that all of them has acl) my device throughput will decrease from 
maximum to around 60Mbps. Actually patch just prevent deadlock in my case but 
throughput is same as before.

________________________________
From: Andrew 👽 Yourtchenko <ayour...@gmail.com>
Sent: Tuesday, May 29, 2018 10:11 AM
To: Rubina Bianchi
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Rx stuck to 0 after a while

Dear Rubina,

thank you for quickly checking it!

Judging by the logs the VPP quits, so I would say there should be a
core file, could you check ?

If you find it (doublecheck by the timestamps that it is indeed the
fresh one), you can load it in gdb (using gdb 'path-to-vpp-binary'
'path-to-core') and then get the backtrace using 'bt', this will give
more idea on what is going on.

--a

On 5/29/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
> Dear Andrew
>
> I tested your patch and my problem still exist, but my service status
> changed and now there isn't any information about deadlock problem. Do you
> have any idea about how I can provide you more information?
>
> root@MYRB:~# service vpp status
> * vpp.service - vector packet processing engine
>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset:
> enabled)
>    Active: inactive (dead)
>
> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded
> plugin: udp_ping_test_plugin.so
> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded
> plugin: stn_test_plugin.so
> May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init args:
> -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0
> -w 0000:08:00.1 -w 0000:08
> May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n 4
> --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w
> 0000:08:00.1 -w 0000:08:00.2 -w 000
> May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough DPDK
> crypto resources, default to OpenSSL
> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal
> SIGCONT, PC 0x7fa535dfbac0
> May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting...
> May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing
> engine...
> May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal
> SIGTERM, PC 0x7fa534121867
> May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing engine.
>
>
> ________________________________
> From: Andrew 👽 Yourtchenko <ayour...@gmail.com>
> Sent: Monday, May 28, 2018 5:58 PM
> To: Rubina Bianchi
> Cc: vpp-dev@lists.fd.io
> Subject: Re: [vpp-dev] Rx stuck to 0 after a while
>
> Dear Rubina,
>
> Thanks for catching and reporting this!
>
> I suspect what might be happening is my recent change of using two
> unidirectional sessions in bihash vs. the single one triggered a race,
> whereby as the owning worker is deleting the session,
> the non-owning worker is trying to update it. That would logically
> explain the "BUG: .." line (since you don't change the interfaces nor
> moving the traffic around, the 5 tuples should not collide), and as
> well the later stop.
>
> To take care of this issue, I think I will split the deletion of the
> session in two stages:
> 1) deactivation of the bihash entries that steer the traffic
> 2) freeing up the per-worker session structure
>
> and have a little pause time inbetween these two so that the
> workers-in-progress could
> finish updating the structures.
>
> The below gerrit is the first cut:
>
> https://gerrit.fd.io/r/#/c/12770/
>
> It passes the make test right now but I did not kick its tires too
> much yet, will do tomorrow.
>
> You can try this change out in your test setup as well and tell me how it
> feels.
>
> --a
>
> On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote:
>> Hi
>>
>> I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl
>> (permit+reflect) and generated sfr traffic using trex v2.27. My rx will
>> become 0 after a short while, about 300 sec in my machine. Here is vpp
>> status:
>>
>> root@MYRB:~# service vpp status
>> * vpp.service - vector packet processing engine
>>    Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor
>> preset:
>> enabled)
>>    Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130;
>> 37s
>> ago
>>   Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm
>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>>   Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf
>> (code=killed, signal=ABRT)
>>   Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic
>> (code=exited,
>> status=0/SUCCESS)
>>   Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm
>> /dev/shm/vpe-api (code=exited, status=0/SUCCESS)
>>  Main PID: 31754 (code=killed, signal=ABRT)
>>
>> May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session
>> LSB16(sw_if_index) and 5-tuple collision!
>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received
>> signal
>> SIGCONT, PC 0x7f1fb591cac0
>> May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting...
>> May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing
>> engine...
>> May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received
>> signal
>> SIGTERM, PC 0x7f1fb3c40867
>> May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int:
>> worker
>> thread deadlock
>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited,
>> code=killed, status=6/ABRT
>> May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine.
>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state.
>> May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result
>> 'signal'.
>>
>> I attach my vpp configs to this email. I also run this test with the same
>> config and added 4 interface instead of two. But in this case nothing
>> happened to vpp and it was functional for a long time.
>>
>> Thanks,
>> RB
>>
>

Re: [vpp-dev] Rx stuck to 0 after a while

Reply via email to