[vpp-dev] Question and bug found on GTP performance testing

Lollita Liu Mon, 22 Jan 2018 02:04:53 -0800

Hi,

                We are do performance testing on GTP of VPP source code, 
testing the GTPU performance impact by creating/removing tunnel. Found some 
curious thing and one bug.




                Testing GTP encryption via one CPU across different rx and tx 
port on same NUMA, with 10K pre-created GTPU tunnel both with data. The result 
is 4.7Mpps@64B.

                Testing GTP encryption via one CPU across different rx and tx 
port on same NUMA, with 10K pre-created GTPU tunnel both with data, and 
creating another 10K GTPU tunnel at the same time.  The result is about 
400K@64B.


                The create tunnel command is "create gtpu tunnel src 1.4.1.1 
dst 1.4.1.2 teid 1 decap-next ip4"and "ip route add 10.4.0.1/32 via 
gtpu_tunnel0".

You can see the throughput impact is huge. Looks there are lots of node as 
gtpu_tunnelxx-tx and gtpu_tunnelxx-output been created, and all worker node 
will waiting for the node graph update. But in the output of show runtime, no 
such node been called. In source code, the GTP-U encryption has been takeover 
by gtpu4-encap with following code "hi->output_node_index = encap_index;" What 
do those gtpu_tunnel nodes used for?

                Since the nodes are useless. We try another case with following 
procedure:
                (1) Create 10K GTP tunnel
                (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel 
with 10K tunnel data
                (3) Creating another 30K GTP tunnel
                (4) Remove the last 30K GTP tunnel
                The main thread fall into dead lock, no response on command 
line, no impact to worker thread .
In GDB output, mheap_maybe_lock  has been called twice.
Thread 1 (Thread 0x7f335bef5740 (LWP 27464)):
#0  0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at 
/home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66
#1  mheap_get_aligned (v=0x7f33199dd000, n_user_data_bytes=8, 
n_user_data_bytes@entry=5, align=<optimized out>, align@entry=4,
    align_offset=0, align_offset@entry=4, 
offset_return=offset_return@entry=0x7f331a968618)
    at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675
#2  0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset 
(os_out_of_memory_on_failure=1, align_offset=4, align=4, size=5)
    at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91
#3  vec_resize_allocate_memory (v=<optimized out>, 
length_increment=length_increment@entry=1, data_bytes=5,
    header_bytes=<optimized out>, header_bytes@entry=0, 
data_align=data_align@entry=4)
    at /home/vpp/vpp/build-data/../src/vppinfra/vec.c:59
#4  0x00007f335b8a10ba in _vec_resize (data_align=<optimized out>, 
header_bytes=<optimized out>, data_bytes=<optimized out>,
    length_increment=<optimized out>, v=<optimized out>) at 
/home/vpp/vpp/build-data/../src/vppinfra/vec.h:142
#5  unix_cli_add_pending_output (uf=0x7f331ba606b4, buffer=0x7f335b8b774f "\r", 
buffer_bytes=1, cf=<optimized out>)
    at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:528
#6  0x00007f335b8a3fcd in unix_cli_file_welcome (cf=0x7f331adaf204, 
cm=<optimized out>)
    at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:1137
#7  0x00007f335ab85fd1 in timer_interrupt (signum=<optimized out>) at 
/home/vpp/vpp/build-data/../src/vppinfra/timer.c:125
#8  <signal handler called>
#9  0x00007f335ab518d9 in mheap_maybe_lock (v=0x7f33199dd000) at 
/home/vpp/vpp/build-data/../src/vppinfra/mheap.c:66
#10 mheap_get_aligned (v=0x7f33199dd000, 
n_user_data_bytes=n_user_data_bytes@entry=12, align=<optimized out>, 
align@entry=4,
    align_offset=0, align_offset@entry=4, 
offset_return=offset_return@entry=0x7f331a968e68)
    at /home/vpp/vpp/build-data/../src/vppinfra/mheap.c:675
#11 0x00007f335ab7b0f7 in clib_mem_alloc_aligned_at_offset 
(os_out_of_memory_on_failure=1, align_offset=4, align=4, size=12)
    at /home/vpp/vpp/build-data/../src/vppinfra/mem.h:91
#12 vec_resize_allocate_memory (v=v@entry=0x0, length_increment=1, 
data_bytes=12, header_bytes=<optimized out>, header_bytes@entry=0,
    data_align=data_align@entry=4) at 
/home/vpp/vpp/build-data/../src/vppinfra/vec.c:59
#13 0x00007f335b8a5eca in _vec_resize (data_align=0, header_bytes=0, 
data_bytes=<optimized out>, length_increment=<optimized out>,
    v=<optimized out>) at /home/vpp/vpp/build-data/../src/vppinfra/vec.h:142
#14 vlib_process_get_events (data_vector=<synthetic pointer>, vm=0x7f335bac42c0 
<vlib_global_main>)
    at /home/vpp/vpp/build-data/../src/vlib/node_funcs.h:562
#15 unix_cli_process (vm=0x7f335bac42c0 <vlib_global_main>, rt=0x7f331a958000, 
f=<optimized out>)
    at /home/vpp/vpp/build-data/../src/vlib/unix/cli.c:2414
#16 0x00007f335b86fd96 in vlib_process_bootstrap (_a=<optimized out>) at 
/home/vpp/vpp/build-data/../src/vlib/main.c:1231
#17 0x00007f335ab463d8 in clib_calljmp () at 
/home/vpp/vpp/build-data/../src/vppinfra/longjmp.S:110
#18 0x00007f331b9dcc20 in ?? ()
#19 0x00007f335b870f49 in vlib_process_startup (f=0x0, p=0x7f331a958000, 
vm=0x7f335bac42c0 <vlib_global_main>)
    at /home/vpp/vpp/build-data/../src/vlib/main.c:1253
#20 dispatch_process (vm=0x7f335bac42c0 <vlib_global_main>, p=0x7f331a958000, 
last_time_stamp=0, f=0x0)
    at /home/vpp/vpp/build-data/../src/vlib/main.c:1296
---Type <return> to continue, or q <return> to quit---

                We modified the previous steps:
                (1) Create 10K GTP tunnel
                (2) Rx-Tx with same NUMA using 1G hugepage and 10K GTPU tunnel 
with 10K tunnel data
                (3) Creating another 10K GTP tunnel
                (4) Remove and Create the last 1K GTP tunnel repeatedly, with 
10 seconds interval.
The result is 4.6Mpps@64B<mailto:4.6Mpps@64B>. Looks only the first time of GTP 
tunnel creating will impact data plane throughput.

BR/Lollita Liu

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

[vpp-dev] Question and bug found on GTP performance testing

Reply via email to