Re: Some performance measurements on the FreeBSD network stack

2012-05-03 Thread Luigi Rizzo
On Thu, Apr 19, 2012 at 03:30:18PM +0200, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various points of the path. Here are some results which

RE: Some performance measurements on the FreeBSD network stack

2012-04-25 Thread Maxim Konovalov
Hi, On Tue, 24 Apr 2012, 17:40-, Li, Qing wrote: Yup, all good points. In fact we have considered all of these while doing the work. In case you haven't seen it already, we did write about these issues in our paper and how we tried to address those, flow-table was one of the solutions.

Re: Some performance measurements on the FreeBSD network stack

2012-04-25 Thread Slawa Olhovchenkov
On Wed, Apr 25, 2012 at 01:22:06PM +0400, Maxim Konovalov wrote: Hi, On Tue, 24 Apr 2012, 17:40-, Li, Qing wrote: Yup, all good points. In fact we have considered all of these while doing the work. In case you haven't seen it already, we did write about these issues in our paper

Re: Some performance measurements on the FreeBSD network stack

2012-04-25 Thread K. Macy
Because there were leaks, there were 100% panics for IPv6, ... at least on the version I had seen in autumn last year. There is certainly no one more interested then me on these in, esp. for v6 where the removal of route caching a long time ago made nd6_nud_hint() a NOP with dst and rt being

Re: Some performance measurements on the FreeBSD network stack

2012-04-25 Thread Bjoern A. Zeeb
On 25. Apr 2012, at 15:45 , K. Macy wrote: a) Where is the possible leak in the legacy path? It's been somewhere in ip_output() in one of the possible combinations go through the code flow. I'd probably need to apply a patch to a tree to get there again. It's been more than 6 months for me as

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Andre Oppermann
On 19.04.2012 22:46, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: On 19.04.2012 15:30, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Luigi Rizzo
On Tue, Apr 24, 2012 at 03:16:48PM +0200, Andre Oppermann wrote: On 19.04.2012 22:46, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: On 19.04.2012 15:30, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend

RE: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Li, Qing
From previous tests, the difference between flowtable and routing table was small with a single process (about 5% or 50ns in the total packet processing time, if i remember well), but there was a large gain with multiple concurrent processes. Yes, that sounds about right when we did the tests a

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread K. Macy
On Tue, Apr 24, 2012 at 4:16 PM, Li, Qing qing...@bluecoat.com wrote: From previous tests, the difference between flowtable and routing table was small with a single process (about 5% or 50ns in the total packet processing time, if i remember well), but there was a large gain with multiple

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread K. Macy
On Tue, Apr 24, 2012 at 5:03 PM, K. Macy km...@freebsd.org wrote: On Tue, Apr 24, 2012 at 4:16 PM, Li, Qing qing...@bluecoat.com wrote: From previous tests, the difference between flowtable and routing table was small with a single process (about 5% or 50ns in the total packet processing time,

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Luigi Rizzo
On Tue, Apr 24, 2012 at 02:16:18PM +, Li, Qing wrote: From previous tests, the difference between flowtable and routing table was small with a single process (about 5% or 50ns in the total packet processing time, if i remember well), but there was a large gain with multiple concurrent

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread K. Macy
On Tue, Apr 24, 2012 at 6:34 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Tue, Apr 24, 2012 at 02:16:18PM +, Li, Qing wrote: From previous tests, the difference between flowtable and routing table was small with a single process (about 5% or 50ns in the total packet processing time, if i

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Fabien Thomas
I have a patch that has been sitting around for a long time due to review cycle latency that caches a pointer to the rtentry (and llentry) in the the inpcb. Before each use the rtentry is checked against a generation number in the routing tree that is incremented on every routing table

RE: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Li, Qing
Yup, all good points. In fact we have considered all of these while doing the work. In case you haven't seen it already, we did write about these issues in our paper and how we tried to address those, flow-table was one of the solutions. http://dl.acm.org/citation.cfm?id=1592641 --Qing

RE: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Li, Qing
I have a patch that has been sitting around for a long time due to review cycle latency that caches a pointer to the rtentry (and llentry) in the the inpcb. Before each use the rtentry is checked against a generation number in the routing tree that is incremented on every routing

Re: Some performance measurements on the FreeBSD network stack

2012-04-24 Thread Bjoern A. Zeeb
On 24. Apr 2012, at 17:42 , Li, Qing wrote: I have a patch that has been sitting around for a long time due to review cycle latency that caches a pointer to the rtentry (and llentry) in the the inpcb. Before each use the rtentry is checked against a generation number in the routing tree

Re: Some performance measurements on the FreeBSD network stack

2012-04-22 Thread K. Macy
Most of these issues are well known. Addressing the bottlenecks is simply time consuming due to the fact that any bugs introduced during development potentially impact many users. -Kip On Sun, Apr 22, 2012 at 4:14 AM, Adrian Chadd adr...@freebsd.org wrote: Hi, This honestly sounds like it's

Re: Some performance measurements on the FreeBSD network stack

2012-04-21 Thread Bruce Evans
On Fri, 20 Apr 2012, K. Macy wrote: On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo ri...@iet.unipi.it wrote: The small penalty when flowtable is disabled but compiled in is probably because the net.flowtable.enable flag is checked a bit deep in the code. The advantage with non-connect()ed

Re: Some performance measurements on the FreeBSD network stack

2012-04-21 Thread Adrian Chadd
Hi, This honestly sounds like it's begging for an instrumentation/analysis/optimisation project. What do we need to do? Adrian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread Luigi Rizzo
On Fri, Apr 20, 2012 at 12:37:21AM +0200, Andre Oppermann wrote: On 20.04.2012 00:03, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: On 19.04.2012 22:46, Luigi Rizzo wrote: The allocation happens while the code has already an exclusive lock on so-snd_buf

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread Alexander V. Chernikov
On 20.04.2012 01:12, Andre Oppermann wrote: On 19.04.2012 22:34, K. Macy wrote: This is indeed a big problem. I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which This only helps if your flows aren't hitting the same rtentry. Otherwise you

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread Andre Oppermann
On 20.04.2012 10:26, Alexander V. Chernikov wrote: On 20.04.2012 01:12, Andre Oppermann wrote: On 19.04.2012 22:34, K. Macy wrote: If the number of peers is bounded then you can use the flowtable. Max PPS is much higher bypassing routing lookup. However, it doesn't scale From my

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread Andre Oppermann
On 20.04.2012 08:35, Luigi Rizzo wrote: On Fri, Apr 20, 2012 at 12:37:21AM +0200, Andre Oppermann wrote: On 20.04.2012 00:03, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: On 19.04.2012 22:46, Luigi Rizzo wrote: The allocation happens while the code has

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread John Baldwin
On Thursday, April 19, 2012 4:46:22 pm Luigi Rizzo wrote: What might be moderately expensive are the critical_enter()/critical_exit() calls around individual allocations. The allocation happens while the code has already an exclusive lock on so-snd_buf so a pool of fresh buffers could be

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread Luigi Rizzo
On Thu, Apr 19, 2012 at 11:06:38PM +0200, K. Macy wrote: On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: This is indeed a big problem. ?I'm working (rough edges remain) on changing the routing table locking

Re: Some performance measurements on the FreeBSD network stack

2012-04-20 Thread K. Macy
Comments inline below: On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Apr 19, 2012 at 11:06:38PM +0200, K. Macy wrote: On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: This is

Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Luigi Rizzo
I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various points of the path. Here are some results which I hope you find interesting. Test conditions: - intel i7-870

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Slawa Olhovchenkov
On Thu, Apr 19, 2012 at 03:30:18PM +0200, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various points of the path. Here are some results

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Andre Oppermann
On 19.04.2012 15:30, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various points of the path. Here are some results which I hope you find

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Luigi Rizzo
On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: On 19.04.2012 15:30, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code and the kernel do return in various

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread K. Macy
This is indeed a big problem.  I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry itself to increment and decrement the

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Luigi Rizzo
On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: This is indeed a big problem. ?I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread K. Macy
On Thu, Apr 19, 2012 at 11:22 PM, Luigi Rizzo ri...@iet.unipi.it wrote: On Thu, Apr 19, 2012 at 10:34:45PM +0200, K. Macy wrote: This is indeed a big problem. ?I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which This only helps if your

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Andre Oppermann
On 19.04.2012 22:34, K. Macy wrote: This is indeed a big problem. I'm working (rough edges remain) on changing the routing table locking to an rmlock (read-mostly) which This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread K. Macy
This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry itself to increment and decrement the rtentry's reference count. The rtentry lock isn't obtained anymore.  While the rmlock read lock is held on the rtable the relevant

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Andre Oppermann
On 19.04.2012 22:46, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 10:05:37PM +0200, Andre Oppermann wrote: On 19.04.2012 15:30, Luigi Rizzo wrote: I have been running some performance tests on UDP sockets, using the netsend program in tools/tools/netrate/netsend and instrumenting the source code

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Andre Oppermann
On 19.04.2012 23:17, K. Macy wrote: This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry itself to increment and decrement the rtentry's reference count. The rtentry lock isn't obtained anymore. While the rmlock read lock is

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread K. Macy
Yes, but the lookup requires a lock?  Or is every entry replicated to every CPU?  So a number of concurrent CPU's sending to the same UDP destination would content on that lock? No. In the default case it's per CPU, thus no serialization is required. But yes, if your transmitting thread

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread K. Macy
On Thu, Apr 19, 2012 at 11:27 PM, Andre Oppermann an...@freebsd.org wrote: On 19.04.2012 23:17, K. Macy wrote: This only helps if your flows aren't hitting the same rtentry. Otherwise you still convoy on the lock for the rtentry itself to increment and decrement the rtentry's reference count.

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Luigi Rizzo
On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: On 19.04.2012 22:46, Luigi Rizzo wrote: ... What might be moderately expensive are the critical_enter()/critical_exit() calls around individual allocations. Can't get away from those as a thread must not migrate away when

Re: Some performance measurements on the FreeBSD network stack

2012-04-19 Thread Andre Oppermann
On 20.04.2012 00:03, Luigi Rizzo wrote: On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote: On 19.04.2012 22:46, Luigi Rizzo wrote: The allocation happens while the code has already an exclusive lock on so-snd_buf so a pool of fresh buffers could be attached there. Ah, there it