Hi, We are do performance testing on GTP of VPP source code, testing the GTPU performance impact by creating/removing tunnel. Found some curious thing and one bug.
Testing GTP encryption via one CPU across different rx and tx port on same NUMA, with 10K pre-created GTPU tunnel both with data. The result is 4.7Mpps@64B. Testing GTP encryption via one CPU across different rx and tx port on same NUMA, with 10K pre-created GTPU tunnel both with data, and creating another 10K GTPU tunnel at the same time. The result is about 400K@64B. The create tunnel command is "create gtpu tunnel src 1.4.1.1 dst 1.4.1.2 teid 1 decap-next ip4"and "ip route add 10.4.0.1/32 via gtpu_tunnel0". You can see the throughput impact is huge. Looks there are lots of node as gtpu_tunnelxx-tx and gtpu_tunnelxx-output been created, and all worker node will waiting for the node graph update. But in the output of show runtime, no such node been called. In source code, the GTP-U encryption has been takeover by gtpu4-encap with following code "hi->output_node_index = encap_index;" What do those gtpu_tunnel nodes used for? Since the nodes are useless. We try another case with following procedure: (1) Create 10K GTP tunnel (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel with 10K tunnel data (3) Creating another 30K GTP tunnel (4) Remove the last 30K GTP tunnel The main thread fall into dead lock, no response on command line, no impact to worker thread . In GDB output, mheap_maybe_lock has been called twice. Thread 1 (Thread 0x7f335bef5740 (LWP 27464)): #0 0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66 #1 mheap_get_aligned (v=0x7f33199dd000, n_user_data_bytes=8, n_user_data_bytes@entry=5, align=<optimized out>, align@entry=4, align_offset=0, align_offset@entry=4, offset_return=offset_return@entry=0x7f331a968618) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675 #2 0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset (os_out_of_memory_on_failure=1, align_offset=4, align=4, size=5) at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91 #3 vec_resize_allocate_memory (v=<optimized out>, length_increment=length_increment@entry=1, data_bytes=5, header_bytes=<optimized out>, header_bytes@entry=0, data_align=data_align@entry=4) at /home/vpp/vpp/build-data/../src/vppinfra/vec.c:59 #4 0x00007f335b8a10ba in _vec_resize (data_align=<optimized out>, header_bytes=<optimized out>, data_bytes=<optimized out>, length_increment=<optimized out>, v=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/vec.h:142 #5 unix_cli_add_pending_output (uf=0x7f331ba606b4, buffer=0x7f335b8b774f "\r", buffer_bytes=1, cf=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:528 #6 0x00007f335b8a3fcd in unix_cli_file_welcome (cf=0x7f331adaf204, cm=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:1137 #7 0x00007f335ab85fd1 in timer_interrupt (signum=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/timer.c:125 #8 <signal handler called> #9 0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66 #10 mheap_get_aligned (v=0x7f33199dd000, n_user_data_bytes=n_user_data_bytes@entry=12, align=<optimized out>, align@entry=4, align_offset=0, align_offset@entry=4, offset_return=offset_return@entry=0x7f331a968e68) at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675 #11 0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset (os_out_of_memory_on_failure=1, align_offset=4, align=4, size=12) at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91 #12 vec_resize_allocate_memory (v=v@entry=0x0, length_increment=1, data_bytes=12, header_bytes=<optimized out>, header_bytes@entry=0, data_align=data_align@entry=4) at /home/vpp/vpp/build-data/../src/vppinfra/vec.c:59 #13 0x00007f335b8a5eca in _vec_resize (data_align=0, header_bytes=0, data_bytes=<optimized out>, length_increment=<optimized out>, v=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/vec.h:142 #14 vlib_process_get_events (data_vector=<synthetic pointer>, vm=0x7f335bac42c0 <vlib_global_main>) at /home/vpp/vpp/build-data/../src/vlib/node_funcs.h:562 #15 unix_cli_process (vm=0x7f335bac42c0 <vlib_global_main>, rt=0x7f331a958000, f=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:2414 #16 0x00007f335b86fd96 in vlib_process_bootstrap (_a=<optimized out>) at /home/vpp/vpp/build-data/../src/vlib/main.c:1231 #17 0x00007f335ab463d8 in clib_calljmp () at /home/vpp/vpp/build-data/../src/vppinfra/longjmp.S:110 #18 0x00007f331b9dcc20 in ?? () #19 0x00007f335b870f49 in vlib_process_startup (f=0x0, p=0x7f331a958000, vm=0x7f335bac42c0 <vlib_global_main>) at /home/vpp/vpp/build-data/../src/vlib/main.c:1253 #20 dispatch_process (vm=0x7f335bac42c0 <vlib_global_main>, p=0x7f331a958000, last_time_stamp=0, f=0x0) at /home/vpp/vpp/build-data/../src/vlib/main.c:1296 ---Type <return> to continue, or q <return> to quit--- We modified the previous steps: (1) Create 10K GTP tunnel (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel with 10K tunnel data (3) Creating another 10K GTP tunnel (4) Remove and Create the last 1K GTP tunnel repeatedly, with 10 seconds interval. The result is 4.6Mpps@64B<mailto:4.6Mpps@64B>. Looks only the first time of GTP tunnel creating will impact data plane throughput. BR/Lollita Liu
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev