Hello,

during building l2tp tunnel with xl2tpd-1.3.1 I was faced with its too
low  upload  performance.  When  download,  the  speed is 20 mbit/s at
nearly  100% CPU utilization. CPU is Pentium D 930 3 GHz. When upload,
the speed is below 2 mbit/s at nearly zero CPU utilization.

First,  I  examined xl2tpd code and did not find any potential issues.
Then  I  compiled  it with -pg option and do a quick test with help of
iperf(1):  4  TCP  flows,  direction  is toward the L2TP server, 2 min
test.   Then  I run gprof and got pretty strange output:

Full output: http://pastebin.com/dm4wz6VK
granularity: each sample hit covers 4 byte(s) for 11.11% of 0.09 seconds
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 77.8       0.07     0.07    24311     0.00     0.00  write [4]
 22.2       0.09     0.02    35897     0.00     0.00  _thread_sys_sendmsg [7]
  0.0       0.09     0.00   109136     0.00     0.00  memset [46]
  0.0       0.09     0.00   107997     0.00     0.00  ___errno [926]
  0.0       0.09     0.00    71740     0.00     0.00  read_packet [47]
  0.0       0.09     0.00    71726     0.00     0.00  _thread_sys_read [927]
  0.0       0.09     0.00    60154     0.00     0.00  _thread_sys_gettimeofday 
[928]
  0.0       0.09     0.00    60114     0.00     0.00  process_signal [32]
  0.0       0.09     0.00    60113     0.00     0.00  build_fdset [29]
  0.0       0.09     0.00    60113     0.00     0.00  process_schedule [26]
  0.0       0.09     0.00    60113     0.00     0.00  select [48]
  0.0       0.09     0.00    35897     0.00     0.00  udp_xmit [6]

Another one run also gives something strange:

Full output: http://pastebin.com/vXvHPuyj
granularity: each sample hit covers 4 byte(s) for 14.29% of 0.07 seconds
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 71.4       0.05     0.05    24348     0.00     0.00  write [4]
 14.3       0.06     0.01    60265     0.00     0.00  select [6]
 14.3       0.07     0.01       10     1.00     1.00  calloc [7]
  0.0       0.07     0.00   109324     0.00     0.00  memset [55]
  0.0       0.07     0.00   108285     0.00     0.00  ___errno [925]
  0.0       0.07     0.00    71997     0.00     0.00  read_packet [56]
  0.0       0.07     0.00    71980     0.00     0.00  _thread_sys_read [926]
  0.0       0.07     0.00    60282     0.00     0.00  _thread_sys_gettimeofday 
[927]
  0.0       0.07     0.00    60266     0.00     0.00  process_signal [50]
  0.0       0.07     0.00    60265     0.00     0.00  build_fdset [49]
  0.0       0.07     0.00    60265     0.00     0.00  process_schedule [57]
  0.0       0.07     0.00    36020     0.00     0.00  _thread_sys_sendmsg [928]
  0.0       0.07     0.00    36020     0.00     0.00  udp_xmit [58]

For comparison I changed direction and did a download test:

Full output: http://pastebin.com/FFF6xGcy
granularity: each sample hit covers 4 byte(s) for 0.01% of 114.30 seconds
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 93.7     107.06   107.06   370804     0.29     0.29  write [5]
  1.8     109.08     2.02   370770     0.01     0.01  recvmsg [6]
  1.4     110.67     1.59   370799     0.00     0.00  _thread_sys_gettimeofday 
[8]
  1.1     111.90     1.23   370770     0.00     0.30  handle_packet [4]
  1.0     113.01     1.11   370761     0.00     0.00  add_fcs [9]
  0.7     113.81     0.81   370781     0.00     0.00  select [10]
  0.2     114.03     0.21        1   210.00 114172.55  network_thread [3]
  0.1     114.14     0.12                             mcount [11]
  0.0     114.19     0.05   370781     0.00     0.00  build_fdset [12]
  0.0     114.23     0.04   370771     0.00     0.00  get_call [13]
  0.0     114.27     0.03  1112701     0.00     0.00  memset [14]
  0.0     114.28     0.02   370781     0.00     0.00  process_schedule [7]
  0.0     114.30     0.01       29     0.34     0.34  _thread_sys_sendmsg [20]
  0.0     114.30     0.01        8     0.62     0.74  inet_ntoa [24]
  0.0     114.30     0.00   370785     0.00     0.00  swaps [110]
  0.0     114.30     0.00   370782     0.00     0.00  process_signal [41]
  0.0     114.30     0.00   370770     0.00     0.00  recycle_buf [111]

During  upload  tests, everything looks like if xl2tpd doesn't perform
any work and stucks somewhere in I/O operation.

May  be  there are some options, what could be tuned to speedup ppp(4)
I/O  performance  or  did  I missed something during my tests? I am in
doubts. Any clues?

-- 
Sergey

Reply via email to