Re: [Openvpn-devel] Linux tun/tap performance issues

2010-03-16 Thread Jan Just Keijser

Hi David,

David Sommerseth wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15/03/10 16:29, Jan Just Keijser wrote:
  

More tests, this time with 'oprofile' : here's a recap:
- nothing changed on the server side:
openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key
--cipher none

- upgraded to kernel 2.6.32.9-70.fc12.x86_64 on my laptop
- selinux is disabled
- installed the debuginfo rpms to get a 'vmlinux'
- configure the oprofile deamon using
opcontrol
--vmlinux=/usr/lib/debug/lib/modules/2.6.32.9-70.fc12.x86_64/vmlinux
- now start it, reset the statistics, start openvpn
 opcontrol --start
 opcontrol --reset
 ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret
secret.key --remote kudde.nikhef.nl --cipher none
- download a file using 'nc' (this maxes out my 100 Mbps LAN at roughly
11 MB/s)
- get the statistics
 opcontrol --dump
 opreport -l > stats



Thanks a lot!  This is way cool!  It just strikes me that you probably
should play with ftrace instead of oprofile.  It's better and got a
lower overhead than oprofile, afaik.



(btw. with an ftrace enabled kernel, it's even available on embedded
devices. You only need 'mount', 'echo' and 'cat' to interact with ftrace
... even though Steven Rostedt is working on a GUI for ftrace, called
kernelshark)

With such new kernel, there's also the perf tool.



In both of these tools you have something called callgraph, iirc.  And
with that you see which function is calling which function and the
amount of time each step used.

I'll ask some people who's been involved in both ftrace and perf for
some better pointers!
  

your wish is my command ;-) :

actually, 'perf' is a lot easier to use then opcontrol/opreport.  I ran

 perf record -g -f -- ./openvpn ..
 perf report --call-graph > perf.data-20100316

At
 http://www.nikhef.nl/~janjust/openvpn/perf.callgraph-20100316.gz
you will find the callgraph for a 512 MB file copy ; the rest is 
identical to the previous tests.


share and enjoy,

JJK / Jan Just Keijser




Re: [Openvpn-devel] Linux tun/tap performance issues

2010-03-15 Thread David Sommerseth
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15/03/10 16:29, Jan Just Keijser wrote:
> More tests, this time with 'oprofile' : here's a recap:
> - nothing changed on the server side:
> openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key
> --cipher none
> 
> - upgraded to kernel 2.6.32.9-70.fc12.x86_64 on my laptop
> - selinux is disabled
> - installed the debuginfo rpms to get a 'vmlinux'
> - configure the oprofile deamon using
> opcontrol
> --vmlinux=/usr/lib/debug/lib/modules/2.6.32.9-70.fc12.x86_64/vmlinux
> - now start it, reset the statistics, start openvpn
>  opcontrol --start
>  opcontrol --reset
>  ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret
> secret.key --remote kudde.nikhef.nl --cipher none
> - download a file using 'nc' (this maxes out my 100 Mbps LAN at roughly
> 11 MB/s)
> - get the statistics
>  opcontrol --dump
>  opreport -l > stats

Thanks a lot!  This is way cool!  It just strikes me that you probably
should play with ftrace instead of oprofile.  It's better and got a
lower overhead than oprofile, afaik.



(btw. with an ftrace enabled kernel, it's even available on embedded
devices. You only need 'mount', 'echo' and 'cat' to interact with ftrace
... even though Steven Rostedt is working on a GUI for ftrace, called
kernelshark)

With such new kernel, there's also the perf tool.



In both of these tools you have something called callgraph, iirc.  And
with that you see which function is calling which function and the
amount of time each step used.

I'll ask some people who's been involved in both ftrace and perf for
some better pointers!

> Here're the results on my laptop, running at runlevel 2 with as many
> daemons stopped and modules unloaded as possible:
> 
> when download a 100 Mb file (using nc) I see:
> 
>> head -20 after.100mb
> samples  %app name symbol name
> 830.0622  vmlinux  read_hpet
> 1961310.6125  vmlinux  mwait_idle_with_hints

read_hpet usually means HPET timer.  I'm actually surprised to see that
one so high up on the list.  Right now I don't recall what
mwait_idle_with_hints does, but I guess it pops up high due to it
calling read_hpet().

> 10692 5.7854  libcrypto.so.1.0.0   /usr/lib64/libcrypto.so.1.0.0
> 5407  2.9257  vmlinux  acpi_os_read_port
> 2546  1.3776  vmlinux  copy_user_generic_string
> 1945  1.0524  opreport /usr/bin/opreport
> 1885  1.0200  vmlinux  hpet_next_event
> 1325  0.7170  tg3  /tg3
> 1235  0.6683  vmlinux  schedule
> 1121  0.6066  tun  /tun
> 1049  0.5676  vmlinux  do_sys_poll
> 796   0.4307  vmlinux  acpi_idle_enter_bm
> 795   0.4302  vmlinux  sched_clock_local
> 769   0.4161  vmlinux  tick_broadcast_oneshot_control
> 757   0.4096  vmlinux  tcp_packet
> 749   0.4053  vmlinux  cfb_imageblit
> 728   0.3939  vmlinux  system_call
> 
> Observations:
> - why the heck is libcrypto so high on the list? I used 'cipher none' !
> - the 'tun' driver does not seem to be the bottleneck

It still might be that there are some locking going on, where the tun
driver calls some code paths which hits the mwait_idle_with_hints ...
(this is why I'm recommending to have a look at the callgraph in perf or
ftrace)

> Ah, of course,  openvpn still used crypto for the HMAC handshake!

That will cause calls to libcrypto.

> After adding '--auth none' to both client and server (and a tweak to
> opreport) I now get:
> 
> samples  %linenr info app name
> symbol name
> 140883   31.1707  hpet.c:748  vmlinux 
> read_hpet
> 5180811.4626  process.c:356   vmlinux 
> mwait_idle_with_hints

Still amazingly high!

[snip]
> Observations:
> - note that openvpn itself does not even make the top 15. It's lower in
> the list, however:
> 11896 0.3580  openvpn  openvpn 
> io_wait_dowork
> 10842 0.3263  openvpn  openvpn  po_wait
> 9608  0.2892  openvpn  openvpn 
> openvpn_decrypt
> 9449  0.2844  openvpn  openvpn  main
> 9250  0.2784  openvpn  openvpn 
> pre_select
> 9191  0.2766  openvpn  openvpn 
> process_incoming_link
> 7027  0.2115  openvpn  openvpn  po_ctl
> 4148  0.1248  openvpn  openvpn 
> packet_id_add
> 4090  0.1231  openvpn  openvpn 
> mss_fixup
> 4022  0.1210  openvpn

Re: [Openvpn-devel] Linux tun/tap performance issues

2010-03-15 Thread Jan Just Keijser

More tests, this time with 'oprofile' : here's a recap:
- nothing changed on the server side:
openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key --cipher 
none

- upgraded to kernel 2.6.32.9-70.fc12.x86_64 on my laptop
- selinux is disabled
- installed the debuginfo rpms to get a 'vmlinux'
- configure the oprofile deamon using
opcontrol
--vmlinux=/usr/lib/debug/lib/modules/2.6.32.9-70.fc12.x86_64/vmlinux
- now start it, reset the statistics, start openvpn
 opcontrol --start
 opcontrol --reset
 ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret
secret.key --remote kudde.nikhef.nl --cipher none
- download a file using 'nc' (this maxes out my 100 Mbps LAN at roughly
11 MB/s)
- get the statistics
 opcontrol --dump
 opreport -l > stats

Here're the results on my laptop, running at runlevel 2 with as many
daemons stopped and modules unloaded as possible:

when download a 100 Mb file (using nc) I see:


head -20 after.100mb

samples  %app name symbol name
830.0622  vmlinux  read_hpet
1961310.6125  vmlinux  mwait_idle_with_hints
10692 5.7854  libcrypto.so.1.0.0   /usr/lib64/libcrypto.so.1.0.0
5407  2.9257  vmlinux  acpi_os_read_port
2546  1.3776  vmlinux  copy_user_generic_string
1945  1.0524  opreport /usr/bin/opreport
1885  1.0200  vmlinux  hpet_next_event
1325  0.7170  tg3  /tg3
1235  0.6683  vmlinux  schedule
1121  0.6066  tun  /tun
1049  0.5676  vmlinux  do_sys_poll
796   0.4307  vmlinux  acpi_idle_enter_bm
795   0.4302  vmlinux  sched_clock_local
769   0.4161  vmlinux  tick_broadcast_oneshot_control
757   0.4096  vmlinux  tcp_packet
749   0.4053  vmlinux  cfb_imageblit
728   0.3939  vmlinux  system_call

Observations:
- why the heck is libcrypto so high on the list? I used 'cipher none' !
- the 'tun' driver does not seem to be the bottleneck

Ah, of course,  openvpn still used crypto for the HMAC handshake!

After adding '--auth none' to both client and server (and a tweak to opreport) 
I now get:

samples  %linenr info app name symbol 
name
140883   31.1707  hpet.c:748  vmlinux  read_hpet
5180811.4626  process.c:356   vmlinux  
mwait_idle_with_hints
13400 2.9648  osl.c:480   vmlinux  
acpi_os_read_port
7034  1.5563  copy_user_64.S:241  vmlinux  
copy_user_generic_string
5837  1.2914  hpet.c:380  vmlinux  
hpet_next_event
3334  0.7377  sched.c:5431vmlinux  schedule
3207  0.7096  select.c:813vmlinux  
do_sys_poll
2499  0.5529  nf_conntrack_proto_tcp.c:824 vmlinux  
tcp_packet
2350  0.5199  entry_64.S:461  vmlinux  
system_call
2274  0.5031  tick-broadcast.c:454vmlinux  
tick_broadcast_oneshot_control
2228  0.4929  processor_idle.c:947vmlinux  
acpi_idle_enter_bm
2204  0.4876  nf_conntrack_core.c:753 vmlinux  
nf_conntrack_in
2152  0.4761  ip_tables.c:309 vmlinux  
ipt_do_table
2004  0.4434  sched_clock.c:105   vmlinux  
sched_clock_local
1966  0.4350  core.c:122  vmlinux  
nf_iterate
1929  0.4268  clockevents.c:241   vmlinux  
clockevents_notify
1904  0.4213  rtc.c:195   vmlinux  
native_read_tsc
1673  0.3702  wait.c:45   vmlinux  
remove_wait_queue
1656  0.3664  select.c:218vmlinux  
__pollwait
1595  0.3529  wait.c:23   vmlinux  
add_wait_queue
1572  0.3478  sched_fair.c:1362   vmlinux  
select_task_rq_fair
1511  0.3343  tick-sched.c:214vmlinux  
tick_nohz_stop_sched_tick
1479  0.3272  random.c:461vmlinux  
mix_pool_bytes_extract
1457  0.3224  file_table.c:327vmlinux  
fget_light
1444  0.3195  nf_conntrack_core.c:72  vmlinux  
__hash_conntrack
1402  0.3102  (no location information)   oprofiled
/usr/bin/oprofiled
1386  0.3067  entry_64.S:781  vmlinux  
irq_entries_start
1347  0.2980  auditsc.c:1680  vmlinux  
audit_syscall_exit
1343  0.2971  skbuff.c:174vmlinux  
__alloc_skb
1342  

Re: [Openvpn-devel] Linux tun/tap performance issues

2010-03-11 Thread Jan Just Keijser

Hi all,

just ran a very silly test, all with openvpn 2.1.1, on my laptop running 
FC12 (2.6.31.12-174.2.22.fc12.x86_64) ,  Intel(R) Core(TM)2 Duo CPU 
T9300 @ 2.50GHz, connected to a 100 Mbps LAN


server side:  
 openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key 
--cipher none

client side:
 time ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret 
secret.key --remote kudde.nikhef.nl --cipher none


and then use 'nc' to pipe a 100 MB file from server to client :

nc'ing it once:
user0m2.150s
sys 0m3.972s

nc'ing it twice:
user0m4.128s
sys 0m8.012s

Now with some encryption:

server side:  
 openvpn --ifconfig 10.222.0.1 10.222.0.2 --dev tun --secret secret.key

client side:
 time ./openvpn --ifconfig 10.222.0.2 10.222.0.1 --dev tun --secret 
secret.key --remote kudde.nikhef.nl


nc'ing it once:
user0m5.010s
sys 0m4.385s

nc'ing it twice:
user0m9.922s
sys 0m8.536s

Transfer speeds are about 10 Mb/s in all cases, i.e. the LAN is maxed out.

So yes , quite some time is spent in 'sys' but I don't think it is too 
much ... my guess is that with older CPUs the amount of time spent in 
the kernel vs the time spent in user space was different, i.e. older 
CPUs had more trouble with the encryption/decryption, hence you'd see a 
larger difference between user vs sys.



cheers,

JJK / Jan Just Keijser



David Sommerseth wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/03/10 15:47, James Yonan wrote:
  
I believe this has been discussed before, but I noticed recently that a 
Linux-based OpenVPN client (Linux 2.6.24, OpenVPN 2.1.1) spends a lot 
more CPU time in kernel space than in user space.  This is surprising, 
given the fact that all of the CPU-intensive cryptographic operations 
are being done in user space.


Using the 'time' utility on the OpenVPN client, while a wget of a 50MB 
file was done over the VPN, I found that 80% of the CPU time was taken 
by the kernel, and 20% by user space (I should add that the Linux client 
was running as a VM on VMWare Fusion).


I'm wondering if anyone with kernel background has any insights on this. 
  Are there performance bottlenecks in the tun/tap driver?



I'm not a hardcore kernel developer, but I'm interested in the to

James, would you mind trying to run a test on the 2.6.33 kernel?  That's
the newest stable kernel available, just to see if your observations are
visible there as well, and if they are better or worse.

And when you say earlier kernels did not use so much time in kernel
space, which kernels did you compare against?  It would be interesting
to look through the changelog from the "good" kernel version to 2.6.24,
to see what was changed in tun.c.

In this case, it would be easier to try to nail down where such a
behavioural change happened.  And after all, 2.6.24 was released about 2
years ago (Jan 24 14:58:37 2008) ... and the drivers/net/tun.c has been
changed 72 times, as far as I can see.  It's a big job going on getting
rid of the BKL (big kernel lock, blocking all apps while kernel is doing
something) ... so it's difficult to say now just what could have been
the reason for what you observe.

James, if you have a "test script" with configuration files, I can setup
a test environment and run some tests and also enable ftrace [1], which
could also pin-point more where the kernel spends its time doing things.

  





Re: [Openvpn-devel] Linux tun/tap performance issues

2010-03-08 Thread Peter Stuge
James Yonan wrote:
> all of the CPU-intensive cryptographic operations are being done in
> user space.

Could some kind of crypto acceleration of OpenSSL be in play?


//Peter



[Openvpn-devel] Linux tun/tap performance issues

2010-03-08 Thread James Yonan
I believe this has been discussed before, but I noticed recently that a 
Linux-based OpenVPN client (Linux 2.6.24, OpenVPN 2.1.1) spends a lot 
more CPU time in kernel space than in user space.  This is surprising, 
given the fact that all of the CPU-intensive cryptographic operations 
are being done in user space.


Using the 'time' utility on the OpenVPN client, while a wget of a 50MB 
file was done over the VPN, I found that 80% of the CPU time was taken 
by the kernel, and 20% by user space (I should add that the Linux client 
was running as a VM on VMWare Fusion).


I'm wondering if anyone with kernel background has any insights on this. 
 Are there performance bottlenecks in the tun/tap driver?


James