Re: [ovs-dev] [PATCH] dpif-netdev: Forwarding optimization for direct output flows.

2019-05-27 Thread Ilya Maximets
On 24.05.2019 19:20, Ben Pfaff wrote:
> On Fri, May 24, 2019 at 03:18:50PM +0300, Ilya Maximets wrote:
>> There are some cases where users want to have simple forwarding or drop
>> rules for all packets received from particular port, i.e :
>>
>>   "in_port=1,actions=2"
>>   "in_port=1,actions=IN_PORT"
>>   "in_port=1,actions=drop"
>>
>> There are also cases where complex OF flows could be simplified down
>> to simple forwarding/drop datapath flows.
>>
>> In theory, we don't need to parse packets at all to follow these flows.
>> "Direct output forwarding" optimization is intended to speed up above
>> cases.
>>
>> Design:
>>
>> Due to various implementation restrictions userspace datapath has
>> following flow fields always in exact match (i.e. it's required to
>> match at least these fields of a packet even if the OF rule doesn't
>> need that):
>>
>>   - recirc_id
>>   - in_port
>>   - packet_type
>>   - dl_type
>>   - vlan_tci
>>   - nw_frag (for ip packets)
>>
>> Not all of these fields are related to packet itself. We already
>> know the current 'recirc_id' and the 'in_port' before starting the
>> packet processing. It also seems safe to assume that we're working
>> with Ethernet packets. dpif-netdev sets exact match on 'vlan_tci'
>> to avoid issues with flow format conversion and we don't really need
>> to match with it until ofproto layer didn't ask us to.
>> So, for the simple forwarding OF rule we need to match only with
>> 'dl_type' and 'nw_frag'.
>>
>> 'in_port', 'dl_type' and 'nw_frag' could be combined in a single
>> 64bit integer that could be used as a hash in hash map.
>>
>> New per-PMD flow table 'direct_output_table' introduced to store
>> direct output flows only. 'dp_netdev_flow_add' adds flow to the
>> usual 'flow_table' and to 'direct_output_table' if the flow meets
>> following constraints:
>>
>>   - 'recirc_id' in flow match is 0.
>>   - Flow wildcards originally had wildcarded 'vlan_tci'.
>>   - Flow has no actions (drop) or exactly one action equal to
>> OVS_ACTION_ATTR_OUTPUT.
>>   - Flow wildcards contains only minimal set of non-wildcarded fields
>> (listed above).
>>
>> If the number of flows for current 'in_port' in regular 'flow_table'
>> equals number of flows for current 'in_port' in 'direct_output_table',
>> we may use direct output optimization, because all the flows we have
>> are direct output flows. This means that we only need to parse
>> 'dl_type' and 'nw_frag' to perform packet matching.
>> Now we making the unique flow mark from the 'in_port', 'dl_type' and
>> 'nw_frag' and looking for it in 'direct_output_table'.
>> On successful lookup we don't need to make full 'miniflow_extract()'.
>>
>> Unsuccessful lookup technically means that we have no sufficient flow
>> in datapath and upcall will be required. We may optimize this path
>> in the future by bypassing the EMC, SMC and dpcls lookups in this case.
>>
>> Performance improvement of this solution on a 'direct output' flows
>> should be comparable with partial HW offloading, because it parses same
>> packet fields and uses similar flow lookup scheme.
>> However, unlike partial HW offloading, it works for all port types
>> including virtual ones.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>
>> This patch was made as a point for "virtio-forwarder" discussion:
>>https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358686.html
>> However, it might be very useful by itself for usual cases too.
>>
>> Testing is very welcome. I didn't run the performance tests on real
>> systems, so I don't know the real performance impact yet.
> 
> The changes to parse_tcp_flags() look good to me.  It may be worth
> updating the comments on that function to say that the output arguments
> are set only if the function returns nonzero.
> 
> In general, I really support the idea that OVS is general-purpose but
> that we should optimize it for important use cases.  That's basically
> what we did at Nicira/VMware for network virtualization: we wanted to
> make sure that OVS was as general as possible so that many people would
> use it and adopt it, while at the same time making sure that it
> performed well for NVP/NSX.  I mean, we didn't want it to run slowly in
> other cases, obviously, but they weren't what we were internally
> benchmarking.
> 
> In my view, this patch is entirely in line with that philosophy.

Thanks!
About comments, It's a good point. I'm going to add following description:

  * 'dl_type_p' will be set only if 'packet' is an Ethernet packet.
  * 'nw_frag_p' will be set only if 'packet' is an IP packet.

It's because I need these fields even if it's not an IP or TCP packet.
I also noticed that I didn't initialize the parameters, so I might
be using them uninitialized for non-IP packets. I'll fix that too.

Few more self-review observations:
1. No need to check if 'dp_netdev_direct_output_enabled' for each packet.
   This could be done once for a batch.
2. I missed checking that packet_type equals to PT_ETH 

Re: [ovs-dev] [PATCH] dpif-netdev: Forwarding optimization for direct output flows.

2019-05-24 Thread Ben Pfaff
On Fri, May 24, 2019 at 03:18:50PM +0300, Ilya Maximets wrote:
> There are some cases where users want to have simple forwarding or drop
> rules for all packets received from particular port, i.e :
> 
>   "in_port=1,actions=2"
>   "in_port=1,actions=IN_PORT"
>   "in_port=1,actions=drop"
> 
> There are also cases where complex OF flows could be simplified down
> to simple forwarding/drop datapath flows.
> 
> In theory, we don't need to parse packets at all to follow these flows.
> "Direct output forwarding" optimization is intended to speed up above
> cases.
> 
> Design:
> 
> Due to various implementation restrictions userspace datapath has
> following flow fields always in exact match (i.e. it's required to
> match at least these fields of a packet even if the OF rule doesn't
> need that):
> 
>   - recirc_id
>   - in_port
>   - packet_type
>   - dl_type
>   - vlan_tci
>   - nw_frag (for ip packets)
> 
> Not all of these fields are related to packet itself. We already
> know the current 'recirc_id' and the 'in_port' before starting the
> packet processing. It also seems safe to assume that we're working
> with Ethernet packets. dpif-netdev sets exact match on 'vlan_tci'
> to avoid issues with flow format conversion and we don't really need
> to match with it until ofproto layer didn't ask us to.
> So, for the simple forwarding OF rule we need to match only with
> 'dl_type' and 'nw_frag'.
> 
> 'in_port', 'dl_type' and 'nw_frag' could be combined in a single
> 64bit integer that could be used as a hash in hash map.
> 
> New per-PMD flow table 'direct_output_table' introduced to store
> direct output flows only. 'dp_netdev_flow_add' adds flow to the
> usual 'flow_table' and to 'direct_output_table' if the flow meets
> following constraints:
> 
>   - 'recirc_id' in flow match is 0.
>   - Flow wildcards originally had wildcarded 'vlan_tci'.
>   - Flow has no actions (drop) or exactly one action equal to
> OVS_ACTION_ATTR_OUTPUT.
>   - Flow wildcards contains only minimal set of non-wildcarded fields
> (listed above).
> 
> If the number of flows for current 'in_port' in regular 'flow_table'
> equals number of flows for current 'in_port' in 'direct_output_table',
> we may use direct output optimization, because all the flows we have
> are direct output flows. This means that we only need to parse
> 'dl_type' and 'nw_frag' to perform packet matching.
> Now we making the unique flow mark from the 'in_port', 'dl_type' and
> 'nw_frag' and looking for it in 'direct_output_table'.
> On successful lookup we don't need to make full 'miniflow_extract()'.
> 
> Unsuccessful lookup technically means that we have no sufficient flow
> in datapath and upcall will be required. We may optimize this path
> in the future by bypassing the EMC, SMC and dpcls lookups in this case.
> 
> Performance improvement of this solution on a 'direct output' flows
> should be comparable with partial HW offloading, because it parses same
> packet fields and uses similar flow lookup scheme.
> However, unlike partial HW offloading, it works for all port types
> including virtual ones.
> 
> Signed-off-by: Ilya Maximets 
> ---
> 
> This patch was made as a point for "virtio-forwarder" discussion:
>https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358686.html
> However, it might be very useful by itself for usual cases too.
> 
> Testing is very welcome. I didn't run the performance tests on real
> systems, so I don't know the real performance impact yet.

The changes to parse_tcp_flags() look good to me.  It may be worth
updating the comments on that function to say that the output arguments
are set only if the function returns nonzero.

In general, I really support the idea that OVS is general-purpose but
that we should optimize it for important use cases.  That's basically
what we did at Nicira/VMware for network virtualization: we wanted to
make sure that OVS was as general as possible so that many people would
use it and adopt it, while at the same time making sure that it
performed well for NVP/NSX.  I mean, we didn't want it to run slowly in
other cases, obviously, but they weren't what we were internally
benchmarking.

In my view, this patch is entirely in line with that philosophy.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev: Forwarding optimization for direct output flows.

2019-05-24 Thread Ilya Maximets
There are some cases where users want to have simple forwarding or drop
rules for all packets received from particular port, i.e :

  "in_port=1,actions=2"
  "in_port=1,actions=IN_PORT"
  "in_port=1,actions=drop"

There are also cases where complex OF flows could be simplified down
to simple forwarding/drop datapath flows.

In theory, we don't need to parse packets at all to follow these flows.
"Direct output forwarding" optimization is intended to speed up above
cases.

Design:

Due to various implementation restrictions userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci
  - nw_frag (for ip packets)

Not all of these fields are related to packet itself. We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing. It also seems safe to assume that we're working
with Ethernet packets. dpif-netdev sets exact match on 'vlan_tci'
to avoid issues with flow format conversion and we don't really need
to match with it until ofproto layer didn't ask us to.
So, for the simple forwarding OF rule we need to match only with
'dl_type' and 'nw_frag'.

'in_port', 'dl_type' and 'nw_frag' could be combined in a single
64bit integer that could be used as a hash in hash map.

New per-PMD flow table 'direct_output_table' introduced to store
direct output flows only. 'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to 'direct_output_table' if the flow meets
following constraints:

  - 'recirc_id' in flow match is 0.
  - Flow wildcards originally had wildcarded 'vlan_tci'.
  - Flow has no actions (drop) or exactly one action equal to
OVS_ACTION_ATTR_OUTPUT.
  - Flow wildcards contains only minimal set of non-wildcarded fields
(listed above).

If the number of flows for current 'in_port' in regular 'flow_table'
equals number of flows for current 'in_port' in 'direct_output_table',
we may use direct output optimization, because all the flows we have
are direct output flows. This means that we only need to parse
'dl_type' and 'nw_frag' to perform packet matching.
Now we making the unique flow mark from the 'in_port', 'dl_type' and
'nw_frag' and looking for it in 'direct_output_table'.
On successful lookup we don't need to make full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no sufficient flow
in datapath and upcall will be required. We may optimize this path
in the future by bypassing the EMC, SMC and dpcls lookups in this case.

Performance improvement of this solution on a 'direct output' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Signed-off-by: Ilya Maximets 
---

This patch was made as a point for "virtio-forwarder" discussion:
   https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358686.html
However, it might be very useful by itself for usual cases too.

Testing is very welcome. I didn't run the performance tests on real
systems, so I don't know the real performance impact yet.

 lib/dpif-netdev.c | 257 +++---
 lib/flow.c|  10 +-
 lib/flow.h|   3 +-
 3 files changed, 251 insertions(+), 19 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5a6f2abac..b455bdc7b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -34,6 +34,7 @@
 
 #include "bitmap.h"
 #include "cmap.h"
+#include "ccmap.h"
 #include "conntrack.h"
 #include "coverage.h"
 #include "ct-dpif.h"
@@ -530,6 +531,8 @@ struct dp_netdev_flow {
 /* Hash table index by unmasked flow. */
 const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
  /* 'flow_table'. */
+const struct cmap_node direct_output_node; /* In dp_netdev_pmd_thread's
+ 'direct_output_table'. */
 const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */
 const ovs_u128 ufid; /* Unique flow identifier. */
 const ovs_u128 mega_ufid;/* Unique mega flow identifier. */
@@ -543,7 +546,8 @@ struct dp_netdev_flow {
 struct ovs_refcount ref_cnt;
 
 bool dead;
-uint32_t mark;   /* Unique flow mark assigned to a flow */
+uint32_t mark;   /* Unique flow mark for netdev offloading. */
+uint64_t direct_output_mark; /* Unique flow mark for direct output. */
 
 /* Statistics. */
 struct dp_netdev_flow_stats stats;
@@ -658,12 +662,19 @@ struct dp_netdev_pmd_thread {
 
 /* Flow-Table and classifiers
  *
- * Writers of 'flow_table' must take the 'flow_mutex'.  Corresponding
- * changes to 'classifiers' must be made while still holding the
- * 'flow_mutex'.
+ * Writers of