Re: [PATCH 1/7] net: refactor __netif_receive_skb_core

2015-04-15 Thread Alexander Duyck



On 04/15/2015 05:44 AM, David Laight wrote:

From: Alexander Duyck

Sent: 10 April 2015 20:56
On 04/10/2015 05:15 AM, Pablo Neira Ayuso wrote:

+another_round:
+   ret = __netif_receive_skb_ingress(skb, pfmemalloc, orig_dev);
+   switch (ret) {
+   case NET_RX_SUCCESS:
+   case NET_RX_DROP:
+   break;
+   case __NET_RX_ANOTHER_ROUND:
+   goto another_round;
+   }
rcu_read_unlock();
+
return ret;
  }




Couldn't this just be done as a do while?  It would probably be easier
to read and there wouldn't be any need for the another_round label anymore.


Or an infinite loop with a break at the bottom, as in:
for (;;) {
switch (...) {
case again:
continue;
default:
break;
}
break;
}

David



That is even more complicated.  What I was thinking was
do {
ret = __netif_receive_skb_ingress(skb, pfmemalloc,
  orig_dev);
} while (ret == __NET_RX_ANOTHER_ROUND);

Either that or the switch could just be replaced with a if statement 
since the only case that really goes anywhere is __NET_RX_ANOTHER_ROUND 
and everything else just exits anyway.  I had just suggested a do/while 
since that lets the goto be dropped, but an if would allow for avoiding 
any unnecessary indentation on the call to __netif_receive_skb_ingress.


- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/7] net: refactor __netif_receive_skb_core

2015-04-15 Thread David Laight
From: Alexander Duyck
 Sent: 10 April 2015 20:56
 On 04/10/2015 05:15 AM, Pablo Neira Ayuso wrote:
  +another_round:
  +   ret = __netif_receive_skb_ingress(skb, pfmemalloc, orig_dev);
  +   switch (ret) {
  +   case NET_RX_SUCCESS:
  +   case NET_RX_DROP:
  +   break;
  +   case __NET_RX_ANOTHER_ROUND:
  +   goto another_round;
  +   }
  rcu_read_unlock();
  +
  return ret;
   }
 
 
 
 Couldn't this just be done as a do while?  It would probably be easier
 to read and there wouldn't be any need for the another_round label anymore.

Or an infinite loop with a break at the bottom, as in:
for (;;) {
switch (...) {
case again:
continue;
default:
break;
}
break;
}

David


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] net: refactor __netif_receive_skb_core

2015-04-15 Thread Jesper Dangaard Brouer
On Fri, 10 Apr 2015 15:47:34 +0200
Daniel Borkmann dan...@iogearbox.net wrote:

 On 04/10/2015 02:15 PM, Pablo Neira Ayuso wrote:
  This patch splits __netif_receive_skb_core() in smaller functions to improve
  maintainability.
 
  The function __netif_receive_skb_core() has been split in two:
 
  * __netif_receive_skb_ingress(), to perform all actions up to
 ingress filtering.
 
  * __netif_receive_skb_finish(), if the ingress filter accepts this
 packet, pass it to the corresponding packet_type function handler for 
  further
  processing.
 
  This patch also adds __NET_RX_ANOTHER_ROUND that is used when the packet is
  stripped off from the vlan header or in case the rx_handler needs it.
 
  This also prepares the introduction of the netfilter ingress hook.
 
 Out of curiosity, what is actually the performance impact on all
 of this? We were just arguing on a different matter on two more
 instructions in the fast-path, here it's refactoring the whole
 function into several ones, I presume gcc won't inline it.

Pablo asked me to performance test this change.  Full test report below.

The performance effect (of this patch) depend on the Gcc compiler
version.

Two tests:
 1. IP-forwarding (unloaded netfilter modules)
 2. Early drop in iptables raw table

With GCC 4.4.7, which does not inline the new functions
(__netif_receive_skb_ingress and __netif_receive_skb_finish) the
performance impact/regression is definitly measurable.

With GCC 4.4.7:
 1. IP-forwarding: +25.18 ns (slower) (-27776 pps)
 2. Early-drop   :  +7.55 ns (slower) (-66577 pps)

With GCC 4.9.1, the new functions gets inlined, thus the refactor
splitup of __netif_receive_skb_core() is basically cancled.
Strangly there is a small improvement for forwarding, likely due to
some lucky assember reordering that give less icache/fetch-misses.
The early-drop improvement is below accuracy levels, can cannot be
trusted.

With GCC 4.9.1:
 1. IP-forwarding: -10.05ns (faster) (+17532 pps)
 2. Early-drop   :  -1.54ns (faster) (+16216 pps) below accuracy levels

I don't know what to conclude, as the result depend on the compiler
version... but these kind of change do affect performance, and should
be tested/measured.

- --
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Quick eval of Pablo's refactor of __netif_receive_skb_core
==
:Version: 0.1
:Author:  Jesper Dangaard Brouer

Summary
===

Pablo is refactoring __netif_receive_skb_core() into several
functions, to allow for some other upcomming changes.
Daniel Borkmann question if this will affect performance.
Jesper tests this.

The performance effect (of this patch) depend on the Gcc compiler
version.

Two tests:
 1. IP-forwarding (unloaded netfilter modules)
 2. Early drop in iptables raw table

With GCC 4.4.7, which does not inline the new functions
(__netif_receive_skb_ingress and __netif_receive_skb_finish) the
performance impact/regression is definitly measurable.

With GCC 4.4.7:
 1. IP-forwarding: +25.18 ns (slower) (-27776 pps)
 2. Early-drop   :  +7.55 ns (slower) (-66577 pps)

With GCC 4.9.1, the new functions gets inlined, thus the refactor
splitup of __netif_receive_skb_core() is basically cancled.
Strangly there is a small improvement for forwarding, likely due to
some lucky assember reordering that give less icache/fetch-misses.
The early-drop improvement is below accuracy levels, can cannot be
trusted.

With GCC 4.9.1:
 1. IP-forwarding: -10.05ns (faster) (+17532 pps)
 2. Early-drop   :  -1.54ns (faster) (+16216 pps) below accuracy levels


Setup
=

On host: ivy


Host ivy is the sink or DUT (Device Under Test).
 * CPU E5-2695ES @ 2.80GHz

netfilter_unload_modules.sh
netfilter_unload_modules.sh
sudo rmmod nf_reject_ipv4 nf_reject_ipv6

base_device_setup.sh eth4  # 10G sink/receiving interface (ixgbe)
base_device_setup.sh eth5
sudo ethtool --coalesce eth4 rx-usecs 30

Make a fake route to 198.18.0.0/15 out via eth5

sudo ip neigh add 192.168.21.66 dev eth5 lladdr 00:00:ba:d0:ba:d0
sudo ip route add 198.18.0.0/15 via 192.168.21.66 dev eth5

Disable power saving to get more accurate measurements (see blogpost)::

 $ sudo tuned-adm active
 Current active profile: latency-performance
 Service tuned: enabled, running
 Service ktune: enabled, running

Early drop in raw
-

alias iptables='sudo iptables'
iptables -t raw -N simple || iptables -t raw -F simple
iptables -t raw -I simple -d 198.18.0.0/15 -j DROP
iptables -t raw -D PREROUTING -j simple
iptables -t raw -I PREROUTING -j simple


On host: dragon
---

Host dragon is the packet generator.
 * 2x CPU E5-2630 0 @ 2.30GHz

Generator NIC: eth8 - ixgbe 10G

netfilter_unload_modules.sh
netfilter_unload_modules.sh
sudo rmmod nf_reject_ipv4 nf_reject_ipv6

base_device_setup.sh eth8  # 10G generator interface 

Re: [PATCH 1/7] net: refactor __netif_receive_skb_core

2015-04-15 Thread Patrick McHardy
On 15.04, Jesper Dangaard Brouer wrote:
  Out of curiosity, what is actually the performance impact on all
  of this? We were just arguing on a different matter on two more
  instructions in the fast-path, here it's refactoring the whole
  function into several ones, I presume gcc won't inline it.
 
 Pablo asked me to performance test this change.  Full test report below.
 
 The performance effect (of this patch) depend on the Gcc compiler
 version.
 
 Two tests:
  1. IP-forwarding (unloaded netfilter modules)
  2. Early drop in iptables raw table
 
 With GCC 4.4.7, which does not inline the new functions
 (__netif_receive_skb_ingress and __netif_receive_skb_finish) the
 performance impact/regression is definitly measurable.
 
 With GCC 4.4.7:
  1. IP-forwarding: +25.18 ns (slower) (-27776 pps)
  2. Early-drop   :  +7.55 ns (slower) (-66577 pps)
 
 With GCC 4.9.1, the new functions gets inlined, thus the refactor
 splitup of __netif_receive_skb_core() is basically cancled.
 Strangly there is a small improvement for forwarding, likely due to
 some lucky assember reordering that give less icache/fetch-misses.
 The early-drop improvement is below accuracy levels, can cannot be
 trusted.
 
 With GCC 4.9.1:
  1. IP-forwarding: -10.05ns (faster) (+17532 pps)
  2. Early-drop   :  -1.54ns (faster) (+16216 pps) below accuracy levels
 
 I don't know what to conclude, as the result depend on the compiler
 version... but these kind of change do affect performance, and should
 be tested/measured.

Thanks Jesper. This effect without inlinging was to be expected I guess.
The interesting question would be a patch that uses nf_hook() without okfn
callback, moves the ingress qdisc to register with the netfilter ingress
hook and moves the TTL tracking of ing_filter() to the ingress qdisc,
where it belongs.

My expectation would be that this would result in an overall performance
gain.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html