Hi, John, The internal mechanism is very clear to me now. And do you have any thought about the dead lock on main thread?
BR/Lollita Liu From: John Lo (loj) [mailto:l...@cisco.com] Sent: Tuesday, January 23, 2018 11:18 AM To: Lollita Liu <lollita....@ericsson.com>; vpp-dev@lists.fd.io Cc: David Yu Z <david.z...@ericsson.com>; Kingwel Xie <kingwel....@ericsson.com>; Terry Zhang Z <terry.z.zh...@ericsson.com>; Jordy You <jordy....@ericsson.com> Subject: RE: Question and bug found on GTP performance testing Hi Lolita, Thank you for providing information from your performance test with observed behavior and problems. On interface creation, including tunnels, VPP always creates dedicated output and tx nodes for each interface. As you correctly observed, these dedicated tx and output nodes are not used for most tunnel interfaces such as GTPU and VXLAN. All these tunnel interfaces of the same tunnel type would use an existing tunnel type specific encap node as their output nodes. I can see that for large scale tunnel deployments, creation of a large number of these not-used output and tx nodes can be an issue, especially when multiple worker threads are used. The worker threads will be blocked from forwarding packets while the main thread is busy creating these nodes and do setups for multiple worker threads. I believe we should improve VPP interface creation to allow a way for creating interfaces, such as tunnels, where existing (encap-)nodes can be specified as interface output nodes without creating dedicated tx and output nodes. Your observation that the forwarding PPS impact only occur during initial tunnel creation and not subsequent delete and create is as expected. It is because on tunnel deletion, the associated interfaces are not deleted but kept in a reused pool for subsequent creation of the same tunnel type. It may not be the best approach for interface usage flexibility but it certainly helps with efficiency of tunnel delete and create cases. I will work on the interface creation improvement described above when I get a chance. I can let you know when a patch is available on vpp master for you to try. As for 18.01 release, it is probably too late to include this improvement. Regards, John From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Lollita Liu Sent: Monday, January 22, 2018 5:04 AM To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Cc: David Yu Z <david.z...@ericsson.com<mailto:david.z...@ericsson.com>>; Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>; Terry Zhang Z <terry.z.zh...@ericsson.com<mailto:terry.z.zh...@ericsson.com>>; Jordy You <jordy....@ericsson.com<mailto:jordy....@ericsson.com>> Subject: [vpp-dev] Question and bug found on GTP performance testing Hi, We are do performance testing on GTP of VPP source code, testing the GTPU performance impact by creating/removing tunnel. Found some curious thing and one bug. Testing GTP encryption via one CPU across different rx and tx port on same NUMA, with 10K pre-created GTPU tunnel both with data. The result is 4.7Mpps@64B<mailto:4.7Mpps@64B>. Testing GTP encryption via one CPU across different rx and tx port on same NUMA, with 10K pre-created GTPU tunnel both with data, and creating another 10K GTPU tunnel at the same time. The result is about 400K@64B. The create tunnel command is "create gtpu tunnel src 1.4.1.1 dst 1.4.1.2 teid 1 decap-next ip4"and "ip route add 10.4.0.1/32 via gtpu_tunnel0". You can see the throughput impact is huge. Looks there are lots of node as gtpu_tunnelxx-tx and gtpu_tunnelxx-output been created, and all worker node will waiting for the node graph update. But in the output of show runtime, no such node been called. In source code, the GTP-U encryption has been takeover by gtpu4-encap with following code "hi->output_node_index = encap_index;" What do those gtpu_tunnel nodes used for? Since the nodes are useless. We try another case with following procedure: (1) Create 10K GTP tunnel (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel with 10K tunnel data (3) Creating another 30K GTP tunnel (4) Remove the last 30K GTP tunnel The main thread fall into dead lock, no response on command line, no impact to worker thread . In GDB output, mheap_maybe_lock has been called twice. Thread 1 (Thread 0x7f335bef5740 (LWP 27464)): #0 0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66 #1 mheap_get_aligned (v=0x7f33199dd000, n_user_data_bytes=8, n_user_data_bytes@entry=5, align=<optimized out>, align@entry=4, align_offset=0, align_offset@entry=4, offset_return=offset_return@entry=0x7f331a968618) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675 #2 0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset (os_out_of_memory_on_failure=1, align_offset=4, align=4, size=5) at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91 #3 vec_resize_allocate_memory (v=<optimized out>, length_increment=length_increment@entry=1, data_bytes=5, header_bytes=<optimized out>, header_bytes@entry=0, data_align=data_align@entry=4) at /home/vpp/vpp/build-data/../src/vppinfra/vec.c:59 #4 0x00007f335b8a10ba in _vec_resize (data_align=<optimized out>, header_bytes=<optimized out>, data_bytes=<optimized out>, length_increment=<optimized out>, v=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/vec.h:142 #5 unix_cli_add_pending_output (uf=0x7f331ba606b4, buffer=0x7f335b8b774f "\r", buffer_bytes=1, cf=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:528 #6 0x00007f335b8a3fcd in unix_cli_file_welcome (cf=0x7f331adaf204, cm=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:1137 #7 0x00007f335ab85fd1 in timer_interrupt (signum=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/timer.c:125 #8 <signal handler called> #9 0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66 #10 mheap_get_aligned (v=0x7f33199dd000, n_user_data_bytes=n_user_data_bytes@entry=12, align=<optimized out>, align@entry=4, align_offset=0, align_offset@entry=4, offset_return=offset_return@entry=0x7f331a968e68) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675 #11 0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset (os_out_of_memory_on_failure=1, align_offset=4, align=4, size=12) at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91 #12 vec_resize_allocate_memory (v=v@entry=0x0, length_increment=1, data_bytes=12, header_bytes=<optimized out>, header_bytes@entry=0, data_align=data_align@entry=4) at /home/vpp/vpp/build-data/../src/vppinfra/vec.c:59 #13 0x00007f335b8a5eca in _vec_resize (data_align=0, header_bytes=0, data_bytes=<optimized out>, length_increment=<optimized out>, v=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/vec.h:142 #14 vlib_process_get_events (data_vector=<synthetic pointer>, vm=0x7f335bac42c0 <vlib_global_main>) at /home/vpp/vpp/build-data/../src/vlib/node_funcs.h:562 #15 unix_cli_process (vm=0x7f335bac42c0 <vlib_global_main>, rt=0x7f331a958000, f=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:2414 #16 0x00007f335b86fd96 in vlib_process_bootstrap (_a=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/main.c:1231 #17 0x00007f335ab463d8 in clib_calljmp () at /home/vpp/vpp/build-data/../src/vppinfra/longjmp.S:110 #18 0x00007f331b9dcc20 in ?? () #19 0x00007f335b870f49 in vlib_process_startup (f=0x0, p=0x7f331a958000, vm=0x7f335bac42c0 <vlib_global_main>) at /home/vpp/vpp/build-data/../src/vlib/main.c:1253 #20 dispatch_process (vm=0x7f335bac42c0 <vlib_global_main>, p=0x7f331a958000, last_time_stamp=0, f=0x0) at /home/vpp/vpp/build-data/../src/vlib/main.c:1296 ---Type <return> to continue, or q <return> to quit--- We modified the previous steps: (1) Create 10K GTP tunnel (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel with 10K tunnel data (3) Creating another 10K GTP tunnel (4) Remove and Create the last 1K GTP tunnel repeatedly, with 10 seconds interval. The result is 4.6Mpps@64B<mailto:4.6Mpps@64B>. Looks only the first time of GTP tunnel creating will impact data plane throughput. BR/Lollita Liu
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev