Re: [Openvpn-devel] Tunnel Speed Tunning with many clients connected

David Sommerseth Thu, 27 Nov 2008 17:46:44 +0000

James MacLean wrote:

David Sommerseth wrote:
James MacLean wrote:
Hi Folks,
I have parsed around a bit but have not come up with a solidsuggestion to increase performance in the following environment :
. +150 clients always on, always via COAX modem 15Mb/s down 1.5Mb/s up.
. OpenVPN-2.0.9 and 2.1rc13 tested, setup as single server
. Server Kernel 2.6.25.4
. Server 64bit
. Server CPU % rarely goes above 30
. Server is fed over a 10G link
Currently we get what appears to be only between 5 and 6 MB/s averageusing this setup.
If only activity is over a single tunnel we can get the expected max(about 14Mb/s to the remote site) for the COAX sites. Once trafficbuilds during the day, that number drops.
We know if we hit it locally we can get 160Mb/s. We know if we do hitit locally and are getting the 160Mb/s that the COAX tunnels dosuffer. Starting by almost 1/2 of their normal throughput tunnelspeed of almost 14Mb/s.
So in my small mind, I am thinking we are seeing around 48Mb/s(6MB/s*8) used, but that we should be able to get over 150Mb/s. CPUisn't hurting. Almost feels like there is a governor slowing down thetraffic :).
Important settings from latest config :

verb 1
dev tap
tun-mtu 1500
tun-mtu-extra 32
mssfix 1468
proto udp
ca SSCert.pem
cert servercert.pem
key serverkey.pem
dh dh1024.pem
tls-auth ./tlspass
keepalive 30 63
ping-timer-rem
persist-tun 1
persist-key 1
cipher none
tcp-queue-limit 4096
sndbuf 131072
rcvbuf 131072


Anyone have any words of wisdom :) ?
Have you tried different ciphers and/or cipher key sizes? I know yousay the server do not suffer with too high load, but it could beinefficiency in the cipher algorithm. If that's the case it might beas well an OpenSSL issue too. It's a shot in the dark, but would begood to wipe this one out. The default is blowfish, so I really donot expect an improvement.
Do you know if threads are enabled in your OpenVPN setup?(compile/configure setting). I believe the default is not to usethreads.
Does the performance drop if you have 150+ clients connected whilebeing passive (not sending any traffic over the tunnel) and onlyhaving 1 client sending traffic?
kind regards,

David Sommerseth
Hi David,
I had hoped that "cipher none" would have the least overhead. Perhapsthere is a better one to try?

Hehe ... no, "cipher none" should have the very least overhead. I would bevery much surprised if anything goes through OpenSSL at this moment. But Iprobably don't need to say anything about the security level by doing it.Anyway, for testing and debugging - good approach!

Threads are enabled in the build, but I only ever see one in the runningprogram. Maybe 64bit is showing it differently or "ps axms" and "ps-eLf" are not the way to display them ?


ps -eLf should display all threads, afaik.

Not sure though how the really threads are implemented, but when I dig intothe code it seems to be initialised as a single thread. I cannot findtraces in the code that indicates that multiple threads is implemented.But it seems like the code is getting ready for it.

I will need to be corrected if my suspicion is wrong, that the corebehaviour between threaded and non-threaded binaries is almost behaving thesame, and not spawning out a thread per connection. If this is the case,I'm not sure if it has any performance impact to use the threaded model.Unless OpenSSL encryption is running in an own separate thread (I have notinvestigated this)

Performance seems fine if they are doing nothing. We can get the fullexpected bandwidth from a single client, or even a small number of clients.
But when the general use of the tunnels comes up, that's when theyappear to suffer.
I regret I do not have much in depth info, but I'm really not sure whichdirection I should be aiming :).

Hmm ... that just seems to indicate that it is a drastic performance dropwhen too many clients are using the tunnels.

When I look at the code, which is quite complex when it comes to the partwhen clients connect, it seems like OpenVPN has it own way of schedulingfor when and how to handle the clients. And it might be that you've founda limit in the implementation.


This code is taken from mudp.c

  /* per-packet event loop */
  while (true)
    {
      perf_push (PERF_EVENT_LOOP);

      /* set up and do the io_wait() */
      multi_get_timeout (&multi, &multi.top.c2.timeval);
      io_wait (&multi.top, p2mp_iow_flags (&multi));
      MULTI_CHECK_SIG (&multi);

      /* check on status of coarse timers */
      multi_process_per_second_timers (&multi);

      /* timeout? */
      if (multi.top.c2.event_set_status == ES_TIMEOUT)
        {
          multi_process_timeout (&multi, MPP_PRE_SELECT|MPP_CLOSE_ON_SIGNAL);
        }
      else
        {
          /* process I/O */
          multi_process_io_udp (&multi);
          MULTI_CHECK_SIG (&multi);
        }

      perf_pop ();
    }

This seems to me to be the main loop. Here it seems that OpenVPN server islistening for traffic on the network connections and processes each packet,no matter which client sending it - and then analysing the packet and let aconnection "object" take care of further processing of the packet. This isjust a wild-guess, as I only spent 10-15 min looking through the code. Buta lot of process magic happens in multi_process_io_udp(), and a couple oflevels deeper a scheduling function is called.

If this really is true, it might be that this model works very well for agood number of clients, until you reach a limit around 150+, when the costof doing this rescheduling begins to be too costly. If this scheduling isnot efficient enough (having a small "sleep" in between, waiting for IO,inefficient or too many code jumps, etc), you will not see that the load onthe server increases too much - but you will most probably feel theperformance loss on the client side. With few active clients, this will ofcourse go better, as the internal scheduler has less clients to switch between.

In addition, I see that the code path is quite long, doing a lot of jumpsbetween a lot of function, and this of course also adds some penalty - eventhough each function seems to be optimised.

This is of course a way how to avoid forking out or starting a new threadper client which works independently, being task switched by the OS. Butto be honest, I think the OS scheduler might be much more efficient in thescheduling and process switches than to have a separate one.

Can anyone with deeper knowledge than me verify or correct me? I wouldlike to understand this part of the code much better.



kind regards,

David Sommerseth

Re: [Openvpn-devel] Tunnel Speed Tunning with many clients connected

Reply via email to