[ovs-dev] [patch_v3 1/3] dpif-netdev: Fix per packet cycles statistics.

2017-08-14 Thread Darrell Ball
From: Ilya Maximets 

DP_STAT_LOOKUP_HIT statistics used mistakenly for calculation
of total number of packets. This leads to completely wrong
per packet cycles statistics.

For example:

emc hits:0
megaflow hits:253702308
avg. subtable lookups per hit:1.50
miss:0
lost:0
avg cycles per packet: 248.32 (157498766585/634255770)

In this case 634255770 total_packets value used for avg
per packet calculation:

  total_packets = 'megaflow hits' + 'megaflow hits' * 1.5

The real value should be 524.38 (157498766585/253702308)

Fix that by summing only stats that reflect match/not match.
It's decided to make direct summing of required values instead of
disabling some stats in a loop to make calculations more clear and
avoid similar issues in the future.

CC: Jan Scheurich 
Fixes: 3453b4d62a98 ("dpif-netdev: dpcls per in_port with sorted subtables")
Signed-off-by: Ilya Maximets 
Acked-by: Jan Scheurich 
---
 lib/dpif-netdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index e2cd931..17e1666 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -755,7 +755,7 @@ pmd_info_show_stats(struct ds *reply,
 unsigned long long stats[DP_N_STATS],
 uint64_t cycles[PMD_N_CYCLES])
 {
-unsigned long long total_packets = 0;
+unsigned long long total_packets;
 uint64_t total_cycles = 0;
 int i;
 
@@ -771,13 +771,12 @@ pmd_info_show_stats(struct ds *reply,
 } else {
 stats[i] = 0;
 }
-
-if (i != DP_STAT_LOST) {
-/* Lost packets are already included in DP_STAT_MISS */
-total_packets += stats[i];
-}
 }
 
+/* Sum of all the matched and not matched packets gives the total.  */
+total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
++ stats[DP_STAT_MISS];
+
 for (i = 0; i < PMD_N_CYCLES; i++) {
 if (cycles[i] > pmd->cycles_zero[i]) {
cycles[i] -= pmd->cycles_zero[i];
-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [patch_v3 0/3] dpif-netdev: Fix and refactor pmd stats.

2017-08-14 Thread Darrell Ball
Fix total pmd stats calculation, refactor pmd stats, enhance
the pmd stats test and clarify pmd-stats-show output.

Darrell Ball (2):
  dpif-netdev: Refactor some pmd stats.
  tests: Enhance the pmd stats test.

Ilya Maximets (1):
  dpif-netdev: Fix per packet cycles statistics.

 lib/dpif-netdev.c | 79 ++-
 tests/pmd.at  | 31 +-
 2 files changed, 74 insertions(+), 36 deletions(-)

-- 
1.9.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for ICMP during SNAT/DNAT

2017-08-14 Thread Alin Serdean
We should revisit how we do hashes and compares over the 'OVS_CT_KEY' at some 
point.

As you pointed "/* icmp_id and port overlap in the union */"
You can drop the lines:
>  HASH_ADD(src.port);
>  HASH_ADD(dst.port);
And
>  FIELD_COMPARE(src.port);
>  FIELD_COMPARE(dst.port);
the outcome should be the same.

Everything else looks good.
Thanks,
Alin.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Anand Kumar
> Sent: Friday, August 11, 2017 6:59 AM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH] datapath-windows: Do not modify port field for
> ICMP during SNAT/DNAT
> 
> During SNAT/DNAT, we should not be updating the port field of ct_endpoint
> struct, as ICMP packets do not have port information. Since port and icmp_id
> are overlapped in ct_endpoint struct, icmp_id gets changed.
> As a result, NAT look up fails to find a matching entry.
> 
> This patch addresses this issue by not modifying icmp_id field during
> SNAT/DNAT only for ICMP traffic
> 
> The current NAT module doesn't take the ICMP type/id/code into account
> during the lookups. Fix this to make it similar with the other conntrack
> module.
> 
> Signed-off-by: Anand Kumar 
> ---
>  datapath-windows/ovsext/Conntrack-nat.c | 23 --
> -
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/datapath-windows/ovsext/Conntrack-nat.c b/datapath-
> windows/ovsext/Conntrack-nat.c
> index ae6b92c..eb6f9db 100644
> --- a/datapath-windows/ovsext/Conntrack-nat.c
> +++ b/datapath-windows/ovsext/Conntrack-nat.c
> @@ -22,6 +22,12 @@ OvsHashNatKey(const OVS_CT_KEY *key)
>  HASH_ADD(src.port);
>  HASH_ADD(dst.port);
>  HASH_ADD(zone);
> +/* icmp_id and port overlap in the union */
> +HASH_ADD(src.icmp_type);
> +HASH_ADD(dst.icmp_type);
> +HASH_ADD(src.icmp_code);
> +HASH_ADD(dst.icmp_code);
> +
>  #undef HASH_ADD
>  return hash;
>  }
> @@ -44,6 +50,12 @@ OvsNatKeyAreSame(const OVS_CT_KEY *key1, const
> OVS_CT_KEY *key2)
>  FIELD_COMPARE(src.port);
>  FIELD_COMPARE(dst.port);
>  FIELD_COMPARE(zone);
> +FIELD_COMPARE(src.icmp_id);
> +FIELD_COMPARE(dst.icmp_id);
> +FIELD_COMPARE(src.icmp_type);
> +FIELD_COMPARE(dst.icmp_type);
> +FIELD_COMPARE(src.icmp_code);
> +FIELD_COMPARE(dst.icmp_code);
>  return TRUE;
>  #undef FIELD_COMPARE
>  }
> @@ -253,6 +265,7 @@ OvsNatAddEntry(OVS_NAT_ENTRY* entry)
>   * Update an Conntrack entry with NAT information. Translated address
> and
>   * port will be generated and write back to the conntrack entry as a
>   * result.
> + * Note: For ICMP, only address is translated.
>   
> *
>   */
>  BOOLEAN
> @@ -271,7 +284,7 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry)
>  BOOLEAN allPortsTried;
>  BOOLEAN originalPortsTried;
>  struct ct_addr firstAddr;
> -
> +
>  uint32_t hash = OvsNatHashRange(entry, 0);
> 
>  if ((entry->natInfo.natAction & NAT_ACTION_SRC) && @@ -310,10
> +323,14 @@ OvsNatTranslateCtEntry(OVS_CT_ENTRY *entry)
>  for (;;) {
>  if (entry->natInfo.natAction & NAT_ACTION_SRC) {
>  entry->rev_key.dst.addr = ctAddr;
> -entry->rev_key.dst.port = htons(port);
> +if (entry->rev_key.nw_proto != IPPROTO_ICMP) {
> +entry->rev_key.dst.port = htons(port);
> +}
>  } else {
>  entry->rev_key.src.addr = ctAddr;
> -entry->rev_key.src.port = htons(port);
> +if (entry->rev_key.nw_proto != IPPROTO_ICMP) {
> +entry->rev_key.src.port = htons(port);
> +}
>  }
> 
>  OVS_NAT_ENTRY *natEntry = OvsNatLookup(>rev_key, TRUE);
> --
> 2.9.3.windows.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] datapath-windows: Update ICMP-Type and Code comparison in CT lookup

2017-08-14 Thread Alin Serdean
Hi Sai and Anand,

Thanks a lot for the patch. I have a few questions regarding the approach. 
Please see inline.

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Anand Kumar
> Sent: Friday, August 11, 2017 11:42 PM
> To: d...@openvswitch.org
> Subject: [ovs-dev] [PATCH] datapath-windows: Update ICMP-Type and Code
> comparison in CT lookup
> 
>  - Update the CT comparison function to compare individual fields instead of
> NdisEqualMemory.
[Alin Serdean] I don't like this change, especially mixing both members of 
union, i.e:
> +(ctxKey.dst.port == entryKey.dst.port) &&
> +(ctxKey.dst.icmp_id == entryKey.dst.icmp_id) &&
Why are you trying to change via member by member comparison?
> - Add in some padding for the ct_endpoint's union.
[Alin Serdean] Why is this needed? Another question do we still need the 'pad' 
member inside ct_endpoint 
(https://github.com/openvswitch/ovs/blob/master/datapath-windows/ovsext/Conntrack.h#L51)
 ?
> - Update the Orig Tuple to use ICMP Type and Code instead of Port for ICMP
[Alin Serdean] Agreed
> 
> Co-authored-by: Sairam Venugopal 
> Signed-off-by: Anand Kumar 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v2 2/3] dpif-netdev: Refactor some pmd stats.

2017-08-14 Thread Darrell Ball


-Original Message-
From: Ilya Maximets 
Date: Monday, August 14, 2017 at 5:36 AM
To: "ovs-dev@openvswitch.org" , Darrell Ball 

Cc: 'Jan Scheurich' , Antonio Fischetti 

Subject: Re: Re: [ovs-dev] [patch_v2 2/3] dpif-netdev: Refactor some pmd stats.

> The per packets stats are presently overlapping in that
> miss stats include lost stats; make these stats non-overlapping
> for clarity and make this clear in the dp_stat_type enum.  This
> also eliminates the need to increment two 'miss' stats for a
> single packet.
> 
> The subtable lookup stats is renamed to make it
> clear that it relates to masked lookups.
> 
> The stats that total to the number of packets seen are defined
> in dp_stat_type and an api is created to total the stats in case
> these stats are further divided in the future.
> 
> Signed-off-by: Darrell Ball 
> ---
>  lib/dpif-netdev.c | 58 
---
>  1 file changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 17e1666..38f5203 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -323,12 +323,21 @@ static struct dp_netdev_port 
*dp_netdev_lookup_port(const struct dp_netdev *dp,
>  OVS_REQUIRES(dp->port_mutex);
>  
>  enum dp_stat_type {
> -DP_STAT_EXACT_HIT,  /* Packets that had an exact match 
(emc). */?
> -DP_STAT_MASKED_HIT, /* Packets that matched in the flow 
table. */
> -DP_STAT_MISS,   /* Packets that did not match. */
> -DP_STAT_LOST,   /* Packets not passed up to the client. 
*/
> -DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow 
table
> -   hits */
> +DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
> +DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
> +DP_STAT_MISS,   /* Packets that did not match and were passed
> +   up to the client. */

This definition of 'DP_STAT_MISS' looks illogical for me. 'MISS' means
the cache miss (EMC or CLS). But with the new definition the name became
meaningless.
Also, there is some informal concept in OVS: 'execute a miss' which
means the upcall execution. And users expects the miss counter to reflect
the number of upcalls executed.

The intention was that the definition of DP_STAT_MISS  to mean that there
was a miss in the cache (which includes exact match and masked caches) and an 
upcall
really happened. So, this ‘is’ what the user expects by ‘execute a miss’ in ovs.
I see 2 related bugs here; I fixed them.
Also, I think the o/p of pmd-stats-show should have some text making this 
clear; I will add them.

Maybe we can define something like DP_N_CACHE_STAT to split the cache
(EMC and CLS) hit/miss stats from others?

This is what DP_N_TOT_PKT_STAT is for. Cache hit/miss addition should total to 
the number of
packets seen. I just realized that this name does not need _STAT and I removed 
it.


> +DP_STAT_LOST,   /* Packets that did not match and were not
> +   passed up to client. */
> +DP_N_TOT_PKT_STAT,  /* The above statistics account for the total
> +   number of packets seen and should be
> +   non overlapping with each other. */
> +DP_STAT_MASKED_LOOKUP_HIT = DP_N_TOT_PKT_STAT,  /* Number of subtable
> +   lookups for flow 
table
> +   hits. Each 
MASKED_HIT
> +   hit will have >= 1
> +   MASKED_LOOKUP_HIT
> +   hit(s). */

Do we need the '_HIT' prefix for DP_STAT_MASKED_LOOKUP_HIT?
This kind of strange name because we're counting not hits but lookups.

I agree
_HIT is not correct as part of this name; I can fix it as part of this patch.


>  DP_N_STATS
>  };
>  
> @@ -749,13 +758,22 @@ enum pmd_info_type {
>  PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */
>  };
>  
> +static unsigned long long
> +dp_netdev_calcul_total_packets(unsigned long long *stats)
> +{
> +unsigned long long total_packets = 0;
> +for (uint8_t i = 0; i < DP_N_TOT_PKT_STAT; i++) {
> +total_packets += stats[i];
> +}
> +return total_packets;
> +}
> +
>  static void
>  pmd_info_show_stats(struct ds *reply,
>  struct 

[ovs-dev] [PATCH 2/2] python: Force file system encoding on cmdline args

2017-08-14 Thread Alin Balutoiu
On Windows, the default file system encoding is 'mbcs'
resulting in a bad conversion.

To make it cross-platform tolerant use
'sys.getfilesystemencoding()' instead of 'utf-8'.


Co-authored-by: Alin Serdean 
Signed-off-by: Alin Balutoiu 
Signed-off-by: Alin Serdean 
---
 tests/test-ovsdb.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index df9d6d5..fc42a2d 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -337,7 +337,8 @@ def idl_set(idl, commands, step):
 if six.PY2:
 s.s = args[2].decode('utf-8')
 else:
-s.s = args[2].encode('utf-8', 'surrogateescape') \
+s.s = args[2].encode(sys.getfilesystemencoding(),
+ 'surrogateescape') \
  .decode('utf-8', 'replace')
 elif args[1] == "u":
 s.u = uuid.UUID(args[2])
-- 
2.10.0.windows.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] python: fix python3 encode/decode on Windows

2017-08-14 Thread Alin Balutoiu
Fix double encoding/decoding on data, caused by
'get_decoded_buffer' and 'get_encoded_buffer'.

The functions 'get_decoded_buffer' and 'get_encoded_buffer'
from winutils have been removed. They are no longer
necessary since the buffers received/returned are already
in the right form.

The necessary encoding has been moved before any sending
function (this also includes named pipes send on Windows).

Co-authored-by: Alin Serdean 
Signed-off-by: Alin Balutoiu 
Signed-off-by: Alin Serdean 
---
 python/ovs/stream.py   | 26 +++---
 python/ovs/winutils.py | 18 --
 2 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/python/ovs/stream.py b/python/ovs/stream.py
index 57e7a6e..f82a449 100644
--- a/python/ovs/stream.py
+++ b/python/ovs/stream.py
@@ -322,8 +322,10 @@ class Stream(object):
 False)
 self._read_pending = False
 recvBuffer = self._read_buffer[:nBytesRead]
-
-return (0, winutils.get_decoded_buffer(recvBuffer))
+# recvBuffer will have the type memoryview in Python3.
+# We can use bytes to convert it to type bytes which works on
+# both Python2 and Python3.
+return (0, bytes(recvBuffer))
 except pywintypes.error as e:
 if e.winerror == winutils.winerror.ERROR_IO_INCOMPLETE:
 # The operation is still pending, try again
@@ -334,7 +336,6 @@ class Stream(object):
 return (0, "")
 else:
 return (errno.EINVAL, "")
-
 (errCode, self._read_buffer) = winutils.read_file(self.pipe,
   n,
   self._read)
@@ -361,7 +362,10 @@ class Stream(object):
 return (e.winerror, "")
 
 recvBuffer = self._read_buffer[:nBytesRead]
-return (0, winutils.get_decoded_buffer(recvBuffer))
+# recvBuffer will have the type memoryview in Python3.
+# We can use bytes to convert it to type bytes which works on
+# both Python2 and Python3.
+return (0, bytes(recvBuffer))
 
 def send(self, buf):
 """Tries to send 'buf' on this stream.
@@ -380,16 +384,17 @@ class Stream(object):
 elif len(buf) == 0:
 return 0
 
+# Python 3 has separate types for strings and bytes.  We must have
+# bytes here.
+if six.PY3 and not isinstance(buf, bytes):
+buf = bytes(buf, 'utf-8')
+elif six.PY2:
+buf = buf.encode('utf-8')
+
 if sys.platform == 'win32' and self.socket is None:
 return self.__send_windows(buf)
 
 try:
-# Python 3 has separate types for strings and bytes.  We must have
-# bytes here.
-if six.PY3 and not isinstance(buf, bytes):
-buf = bytes(buf, 'utf-8')
-elif six.PY2:
-buf = buf.encode('utf-8')
 return self.socket.send(buf)
 except socket.error as e:
 return -ovs.socket_util.get_exception_errno(e)
@@ -413,7 +418,6 @@ class Stream(object):
 else:
 return -errno.EINVAL
 
-buf = winutils.get_encoded_buffer(buf)
 self._write_pending = False
 (errCode, nBytesWritten) = winutils.write_file(self.pipe,
buf,
diff --git a/python/ovs/winutils.py b/python/ovs/winutils.py
index a3627ff..c29226f 100644
--- a/python/ovs/winutils.py
+++ b/python/ovs/winutils.py
@@ -198,24 +198,6 @@ def get_overlapped_result(handle, overlapped=None, 
bWait=False):
 raise
 
 
-def get_decoded_buffer(recvBuffer):
-if six.PY3:
-return bytes(recvBuffer).decode("utf-8")
-else:
-return str(recvBuffer)
-
-
-def get_encoded_buffer(buff):
-# Python 3 has separate types for strings and bytes.
-# We must have bytes here.
-if not isinstance(buff, six.binary_type):
-if six.PY3:
-buff = six.binary_type(buff, 'utf-8')
-else:
-buff = six.binary_type(buff)
-return buff
-
-
 def get_new_event(sa=None, bManualReset=True, bInitialState=True,
   objectName=None):
 return win32event.CreateEvent(sa, bManualReset, bInitialState, objectName)
-- 
2.10.0.windows.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/4] netdev-tc-offloads: Add support for action set

2017-08-14 Thread Joe Stringer
On 8 August 2017 at 07:21, Roi Dayan  wrote:
> From: Paul Blakey 
>
> Implement support for offloading ovs action set using
> tc header rewrite action.
>
> Signed-off-by: Paul Blakey 
> Reviewed-by: Roi Dayan 
> ---



> @@ -274,6 +346,48 @@ netdev_tc_flow_dump_destroy(struct netdev_flow_dump 
> *dump)
>  return 0;
>  }
>
> +static void
> +parse_flower_rewrite_to_netlink_action(struct ofpbuf *buf,
> +   struct tc_flower *flower)
> +{
> +char *mask = (char *) >rewrite.mask;
> +char *data = (char *) >rewrite.key;
> +
> +for (int type = 0; type < ARRAY_SIZE(set_flower_map); type++) {
> +char *put = NULL;
> +size_t nested = 0;
> +int len = ovs_flow_key_attr_lens[type].len;
> +
> +if (len <= 0) {
> +continue;
> +}
> +
> +for (int j = 0; j < ARRAY_SIZE(set_flower_map[type]); j++) {
> +struct netlink_field *f = _flower_map[type][j];
> +
> +if (!f->size) {
> +break;
> +}

Seems like maybe there's similar feedback here to the previous patch
wrt sparsely populated array set_flower_map[].

> +
> +if (!is_all_zeros(mask + f->flower_offset, f->size)) {
> +if (!put) {
> +nested = nl_msg_start_nested(buf,
> + OVS_ACTION_ATTR_SET_MASKED);
> +put = nl_msg_put_unspec_zero(buf, type, len * 2);

Do we ever have multiple of these attributes in a single nested
OVS_ACTION_ATTR_SET_MASKED? If so, won't the following memcpys avoid
putting the netlink header for the subsequent sets?

> +}
> +
> +memcpy(put + f->offset, data + f->flower_offset, f->size);
> +memcpy(put + len + f->offset,
> +   mask + f->flower_offset, f->size);

Could we these combine with the nl_msg_put_unspec() above?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Yang, Yi
On Tue, Aug 15, 2017 at 12:09:14AM +0800, Eric Garver wrote:
> On Thu, Aug 10, 2017 at 09:21:15PM +0800, Yi Yang wrote:
> 
> Hi Yi,
> 
> In general I'd like to echo Jiri's comments on the netlink attributes.
> I'd like to see the metadata separate.
> 
> I have a few other comments below.
> 
> Thanks.
> Eric.
> 
> [..]

Thanks Eric, I'm doing this and it is almost done, there is still an
issue to fix.

> > +{
> > +   return 4 * (ntohs(nsh->ver_flags_len) & NSH_LEN_MASK) >> NSH_LEN_SHIFT;
> 
> This is doing the multiplication before the shift. It works only because
> the shift is 0.
> 

Thank you for catching this, the right one:

((ntohs(nsh->ver_flags_len) & NSH_LEN_MASK) >> NSH_LEN_SHIFT) << 2

> > +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
> > +   const struct ovs_action_push_nsh *oapn)
> > +{
> > +   struct nsh_hdr *nsh;
> > +   size_t length = NSH_BASE_HDR_LEN + oapn->mdlen;
> > +   u8 next_proto;
> > +
> > +   if (key->mac_proto == MAC_PROTO_ETHERNET) {
> > +   next_proto = NSH_P_ETHERNET;
> > +   } else {
> > +   switch (ntohs(skb->protocol)) {
> > +   case ETH_P_IP:
> > +   next_proto = NSH_P_IPV4;
> > +   break;
> > +   case ETH_P_IPV6:
> > +   next_proto = NSH_P_IPV6;
> > +   break;
> > +   case ETH_P_NSH:
> > +   next_proto = NSH_P_NSH;
> > +   break;
> > +   default:
> > +   return -ENOTSUPP;
> > +   }
> > +   }
> > +
> >
> 
> I believe you need to validate that oapn->mdlen is a multiple of 4.
>

I'll add this check.

> > +   switch (nsh->md_type) {
> > +   case NSH_M_TYPE1:
> > +   nsh->md1 = *(struct nsh_md1_ctx *)oapn->metadata;
> > +   break;
> > +   case NSH_M_TYPE2: {
> > +   /* The MD2 metadata in oapn is already padded to 4 bytes. */
> > +   size_t len = DIV_ROUND_UP(oapn->mdlen, 4) * 4;
> > +
> > +   memcpy(nsh->md2, oapn->metadata, len);
> 
> I don't see any validation of oapn->mdlen. Normally this happens in
> __ovs_nla_copy_actions(). It will be made easier if you add a separate
> MD attribute as Jiri has suggested.
>

Got it, will include this in next version.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] tc: Add header rewrite using tc pedit action

2017-08-14 Thread Joe Stringer
On 8 August 2017 at 07:21, Roi Dayan  wrote:
> From: Paul Blakey 
>
> To be later used to implement ovs action set offloading.
>
> Signed-off-by: Paul Blakey 
> Reviewed-by: Roi Dayan 
> ---

Hi Paul and folks, some questions inline.

utilities/checkpatch.py says:

$ utilities/checkpatch.py -4
== Checking 00ecb482f5d1 ("compat: Add act_pedit compatibility for old
kernels") ==
Lines checked: 127, no obvious problems found

== Checking 7503746718f6 ("odp-util: Expose ovs flow key attr len
table for reuse") ==
Lines checked: 195, no obvious problems found

== Checking 1e8ab68aefe1 ("tc: Add header rewrite using tc pedit action") ==
ERROR: Improper whitespace around control block
#359 FILE: lib/tc.c:480:
   NL_ATTR_FOR_EACH(nla, left, keys_ex, keys_ex_size) {

Lines checked: 721, Warnings: 0, Errors: 1

== Checking f6b52ce504c2 ("netdev-tc-offloads: Add support for action set") ==
ERROR: Improper whitespace around control block
#908 FILE: lib/netdev-tc-offloads.c:590:
   NL_ATTR_FOR_EACH_UNSAFE(set_attr, set_left, set, set_len) {

Lines checked: 981, Warnings: 0, Errors: 1



> +static struct flower_key_to_pedit flower_pedit_map[] = {
> +[offsetof(struct tc_flower_key, ipv4.ipv4_src)] = {
> +TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
> +12,
> +MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_src)
> +},

Why are these items indexed by the offset, and not just an array we
iterate? It seems like the only places where we iterate this, we use
"for (i=0; ... i++)", which means that we iterate several times across
empty items, and this array is sparsely populated.

> +[offsetof(struct tc_flower_key, ipv4.ipv4_dst)] = {
> +TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
> +16,
> +MEMBER_SIZEOF(struct tc_flower_key, ipv4.ipv4_dst)
> +},
> +[offsetof(struct tc_flower_key, ipv4.rewrite_ttl)] = {
> +TCA_PEDIT_KEY_EX_HDR_TYPE_IP4,
> +8,
> +MEMBER_SIZEOF(struct tc_flower_key, ipv4.rewrite_ttl)
> +},



> @@ -346,6 +425,96 @@ nl_parse_flower_ip(struct nlattr **attrs, struct 
> tc_flower *flower) {
>  }
>  }
>
> +static const struct nl_policy pedit_policy[] = {
> +[TCA_PEDIT_PARMS_EX] = { .type = NL_A_UNSPEC,
> + .min_len = sizeof(struct tc_pedit),
> + .optional = false, },
> +[TCA_PEDIT_KEYS_EX]   = { .type = NL_A_NESTED,
> +  .optional = false, },
> +};
> +
> +static int
> +nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
> +{
> +struct nlattr *pe_attrs[ARRAY_SIZE(pedit_policy)];
> +const struct tc_pedit *pe;
> +const struct tc_pedit_key *keys;
> +const struct nlattr *nla, *keys_ex, *ex_type;
> +const void *keys_attr;
> +char *rewrite_key = (void *) >rewrite.key;
> +char *rewrite_mask = (void *) >rewrite.mask;

These are actually 'struct tc_flower_key', shouldn't we use the actual
types? (I realise there is pointer arithmetic below, but usually we
restrict the usage of void casting and pointer arithmetic as much as
possible).

> +size_t keys_ex_size, left;
> +int type, i = 0;
> +
> +if (!nl_parse_nested(options, pedit_policy, pe_attrs,
> + ARRAY_SIZE(pedit_policy))) {
> +VLOG_ERR_RL(_rl, "failed to parse pedit action options");
> +return EPROTO;
> +}
> +
> +pe = nl_attr_get_unspec(pe_attrs[TCA_PEDIT_PARMS_EX], sizeof *pe);
> +keys = pe->keys;
> +keys_attr = pe_attrs[TCA_PEDIT_KEYS_EX];
> +keys_ex = nl_attr_get(keys_attr);
> +keys_ex_size = nl_attr_get_size(keys_attr);
> +
> +NL_ATTR_FOR_EACH(nla, left, keys_ex, keys_ex_size) {
> +if (i >= pe->nkeys) {
> +break;
> +}
> +
> +if (nl_attr_type(nla) == TCA_PEDIT_KEY_EX) {
> +ex_type = nl_attr_find_nested(nla, TCA_PEDIT_KEY_EX_HTYPE);
> +type = nl_attr_get_u16(ex_type);
> +
> +for (int j = 0; j < ARRAY_SIZE(flower_pedit_map); j++) {
> +struct flower_key_to_pedit *m = _pedit_map[j];
> +int flower_off = j;
> +int sz = m->size;
> +int mf = m->offset;
> +
> +if (!sz || m->htype != type) {
> +   continue;
> +}

If flower_pedit_map was just a regular array and the offset was
included in 'struct flower_key_to_pedit', then I don't think we need
the above check?

> +
> +/* check overlap between current pedit key, which is always
> + * 4 bytes (range [off, off + 3]), and a map entry in
> + * flower_pedit_map (range [mf, mf + sz - 1]) */
> +if ((keys->off >= mf && keys->off < mf + sz)
> +|| (keys->off + 3 >= mf && keys->off + 3 < mf + sz)) {
> +int diff = flower_off + (keys->off - mf);
> +

[ovs-dev] pwclient XML errors and form feeds

2017-08-14 Thread Mark Michelson
Earlier today on IRC I brought up an error I had while attempting to use
pwclient to get a patch from patchwork. Here [1] is the pastebin I linked
to. At the time, I thought it was due to HTTP chunked encoding being used
and the xmlrpclib or HTTP client library not handling that properly.

I debugged the problem further and I now have figured out that the problem
has to do with a form feed character being present in the patch. The
Patchwork server places the patch text inside an XML 1.0 document; however,
form-feeds are illegal in XML 1.0 [2]. This makes the use of form feeds and
the use of pwclient incompatible.

Aaron Conole pointed out that once Patchwork 2.0 is available, the git-pw
command will render pwclient obsolete. I checked, and git-pw will use
Patchwork 2.0's REST API (which is JSON-based) instead of XMLRPC, so this
is indeed only a temporary problem.

In the meantime, is there some other command line tool OVS devs use instead
of pwclient for getting patches?

Also also, can someone inform me why the coding guidelines encourage using
form feeds? Do they aid in documentation generation? Do people use editors
that render them in some meaningful way? Searching git logs and the mailing
list archives didn't give me anything to go on.

Thanks,
Mark Michelson

[1] https://paste.fedoraproject.org/paste/gE6neR6-Yf19YnO7hM576g
[2] https://www.w3.org/TR/REC-xml/#charsets
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] redhat: fix upgrades where group doesn't exist

2017-08-14 Thread Aaron Conole
The upgrade from older Open vSwitch versions on RHEL will try, as much as
possible, to preserve the system.  This means no new users or groups are
created.  As an effect, it's possible for the chown to fail, because the
hugetlbfs group may not exist.  While it did on my systems, it was not
there on others.

This change allows the ExecStartPre commands to fail.  In the case that the
user doesn't use DPDK, it won't matter anyway.

Fixes: e3e738a3d058 ('redhat: allow dpdk to also run as non-root user')
Signed-off-by: Aaron Conole 
Reported-by: Jean-Tsung Hsiao 
Tested-by: Jean-Tsung Hsiao 
---

 rhel/usr_lib_systemd_system_ovs-vswitchd.service.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in 
b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
index bf0f058..c6d9aa1 100644
--- a/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
+++ b/rhel/usr_lib_systemd_system_ovs-vswitchd.service.in
@@ -14,8 +14,8 @@ Environment=HOME=/var/run/openvswitch
 EnvironmentFile=/etc/openvswitch/default.conf
 EnvironmentFile=-/etc/sysconfig/openvswitch
 @begin_dpdk@
-ExecStartPre=/usr/bin/chown :hugetlbfs /dev/hugepages
-ExecStartPre=/usr/bin/chmod 0775 /dev/hugepages
+ExecStartPre=-/usr/bin/chown :hugetlbfs /dev/hugepages
+ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages
 @end_dpdk@
 ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \
   --no-ovsdb-server --no-monitor --system-id=random \
-- 
2.9.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Stokes, Ian
> 
> That is the one in flight now.
> I guess there are no others being planned, then ?
> 

Not from Intel that I am aware of.

> 
> -Original Message-
> From: "Stokes, Ian" 
> Date: Monday, August 14, 2017 at 2:56 AM
> To: Jan Scheurich , Darrell Ball
> , "ovs-dev@openvswitch.org" 
> Cc: "O Mahony, Billy" 
> Subject: RE: [ovs-dev] OVS 2.9 Intel Roadmap
> 
> : RE: [ovs-dev] OVS 2.9 Intel Roadmap
> 
> >
> 
> > What about Billy's patch set for Rx queue prioritization?
> 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-
> 2DJuly_336001.html=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=1BYS5DJYkjWJD4seuUiAzwg-
> 7v6sQdRxBkD4MCGY45Y=ZAVlg8xWiiYtEprqO1iX0DbCTyJK3wiU86HTM5p1CoM=
> 
> > Jan
> 
> >
> 
> Good catch Jan, I missed that myself (Billy was out of office last
> week when putting the list together). I'll add this to the list.
> 
> 
> 
> Thanks
> 
> Ian
> 
> > > -Original Message-
> 
> > > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> 
> > > boun...@openvswitch.org] On Behalf Of Stokes, Ian
> 
> > > Sent: Monday, 14 August, 2017 10:45
> 
> > > To: Darrell Ball ; ovs-dev@openvswitch.org
> 
> > > Subject: Re: [ovs-dev] OVS 2.9 Intel Roadmap
> 
> > >
> 
> > > > Any prioritization or other qos features missing ?
> 
> > > >
> 
> > > Hi Darrell,
> 
> > >
> 
> > > Do you mean the 2 qos patchsets on the list currently (support for
> 
> > > queues and move qos in do_tx_copy?).
> 
> > >
> 
> > > If so, I didn’t include these in the list as Intel were not the
> 
> > > authors. I do plan to provide feedback on them though but the last
> few
> 
> > > weeks have been a bit hectic on my side.
> 
> > >
> 
> > > Are there any other QoS features you are referring to that are
> missing?
> 
> > >
> 
> > > Ian
> 
> > > ___
> 
> > > dev mailing list
> 
> > > d...@openvswitch.org
> 
> > > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-
> uZnsw=1BYS5DJYkjWJD4seuUiAzwg-
> 7v6sQdRxBkD4MCGY45Y=AQShd_s5gqoCXl19kdh_pwRw7qYPflOwltla-XoZVgI=
> 
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Darrell Ball
That is the one in flight now.
I guess there are no others being planned, then ?


-Original Message-
From: "Stokes, Ian" 
Date: Monday, August 14, 2017 at 2:56 AM
To: Jan Scheurich , Darrell Ball 
, "ovs-dev@openvswitch.org" 
Cc: "O Mahony, Billy" 
Subject: RE: [ovs-dev] OVS 2.9 Intel Roadmap

: RE: [ovs-dev] OVS 2.9 Intel Roadmap

> 

> What about Billy's patch set for Rx queue prioritization?

> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DJuly_336001.html=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=1BYS5DJYkjWJD4seuUiAzwg-7v6sQdRxBkD4MCGY45Y=ZAVlg8xWiiYtEprqO1iX0DbCTyJK3wiU86HTM5p1CoM=
 

> Jan

> 

Good catch Jan, I missed that myself (Billy was out of office last week 
when putting the list together). I'll add this to the list.



Thanks

Ian

> > -Original Message-

> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-

> > boun...@openvswitch.org] On Behalf Of Stokes, Ian

> > Sent: Monday, 14 August, 2017 10:45

> > To: Darrell Ball ; ovs-dev@openvswitch.org

> > Subject: Re: [ovs-dev] OVS 2.9 Intel Roadmap

> >

> > > Any prioritization or other qos features missing ?

> > >

> > Hi Darrell,

> >

> > Do you mean the 2 qos patchsets on the list currently (support for

> > queues and move qos in do_tx_copy?).

> >

> > If so, I didn’t include these in the list as Intel were not the

> > authors. I do plan to provide feedback on them though but the last few

> > weeks have been a bit hectic on my side.

> >

> > Are there any other QoS features you are referring to that are missing?

> >

> > Ian

> > ___

> > dev mailing list

> > d...@openvswitch.org

> > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=1BYS5DJYkjWJD4seuUiAzwg-7v6sQdRxBkD4MCGY45Y=AQShd_s5gqoCXl19kdh_pwRw7qYPflOwltla-XoZVgI=
 



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Eric Garver
On Thu, Aug 10, 2017 at 09:21:15PM +0800, Yi Yang wrote:
> OVS master and 2.8 branch has merged NSH userspace
> patch series, this patch is to enable NSH support
> in kernel data path in order that OVS can support
> NSH in 2.8 release in compat mode by porting this.
> 
> Signed-off-by: Yi Yang 
> ---
>  drivers/net/vxlan.c  |   7 ++
>  include/net/nsh.h| 126 ++
>  include/uapi/linux/openvswitch.h |  33 
>  net/openvswitch/actions.c| 165 
> +++
>  net/openvswitch/flow.c   |  41 ++
>  net/openvswitch/flow.h   |   1 +
>  net/openvswitch/flow_netlink.c   |  54 -
>  7 files changed, 426 insertions(+), 1 deletion(-)
>  create mode 100644 include/net/nsh.h

Hi Yi,

In general I'd like to echo Jiri's comments on the netlink attributes.
I'd like to see the metadata separate.

I have a few other comments below.

Thanks.
Eric.

[..]
> diff --git a/include/net/nsh.h b/include/net/nsh.h
> new file mode 100644
> index 000..96477a1
> --- /dev/null
> +++ b/include/net/nsh.h
> @@ -0,0 +1,126 @@
> +#ifndef __NET_NSH_H
> +#define __NET_NSH_H 1
> +
> +
> +/*
> + * Network Service Header:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |Ver|O|C|R|R|R|R|R|R|Length   |   MD Type   |  Next Proto   |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |Service Path ID| Service Index |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |   |
> + * ~   Mandatory/Optional Context Header   ~
> + * |   |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * Ver = The version field is used to ensure backward compatibility
> + *   going forward with future NSH updates.  It MUST be set to 0x0
> + *   by the sender, in this first revision of NSH.
> + *
> + * O = OAM. when set to 0x1 indicates that this packet is an operations
> + * and management (OAM) packet.  The receiving SFF and SFs nodes
> + * MUST examine the payload and take appropriate action.
> + *
> + * C = context. Indicates that a critical metadata TLV is present.
> + *
> + * Length : total length, in 4-byte words, of NSH including the Base
> + *  Header, the Service Path Header and the optional variable
> + *  TLVs.
> + * MD Type: indicates the format of NSH beyond the mandatory Base Header
> + *  and the Service Path Header.
> + *
> + * Next Protocol: indicates the protocol type of the original packet. A
> + *  new IANA registry will be created for protocol type.
> + *
> + * Service Path Identifier (SPI): identifies a service path.
> + *  Participating nodes MUST use this identifier for Service
> + *  Function Path selection.
> + *
> + * Service Index (SI): provides location within the SFP.
> + *
> + * [0] https://tools.ietf.org/html/draft-ietf-sfc-nsh-13
> + */
> +
> +/**
> + * struct nsh_md1_ctx - Keeps track of NSH context data
> + * @nshc<1-4>: NSH Contexts.
> + */
> +struct nsh_md1_ctx {
> + __be32 c[4];
> +};
> +
> +struct nsh_md2_tlv {
> + __be16 md_class;
> + u8 type;
> + u8 length;
> + u8 md_value[];
> +};
> +
> +struct nsh_hdr {
> + __be16 ver_flags_len;
> + u8 md_type;
> + u8 next_proto;
> + __be32 path_hdr;
> + union {
> + struct nsh_md1_ctx md1;
> + struct nsh_md2_tlv md2[0];
> + };
> +};
> +
> +/* Masking NSH header fields. */
> +#define NSH_VER_MASK   0xc000
> +#define NSH_VER_SHIFT  14
> +#define NSH_FLAGS_MASK 0x3fc0
> +#define NSH_FLAGS_SHIFT6
> +#define NSH_LEN_MASK   0x003f
> +#define NSH_LEN_SHIFT  0
> +
> +#define NSH_SPI_MASK   0xff00
> +#define NSH_SPI_SHIFT  8
> +#define NSH_SI_MASK0x00ff
> +#define NSH_SI_SHIFT   0
> +
> +#define NSH_DST_PORT4790 /* UDP Port for NSH on VXLAN. */
> +#define ETH_P_NSH   0x894F   /* Ethertype for NSH. */
> +
> +/* NSH Base Header Next Protocol. */
> +#define NSH_P_IPV40x01
> +#define NSH_P_IPV60x02
> +#define NSH_P_ETHERNET0x03
> +#define NSH_P_NSH 0x04
> +#define NSH_P_MPLS0x05
> +
> +/* MD Type Registry. */
> +#define NSH_M_TYPE1 0x01
> +#define NSH_M_TYPE2 0x02
> +#define NSH_M_EXP1  0xFE
> +#define NSH_M_EXP2  0xFF
> +
> +/* NSH Metadata Length. */
> +#define NSH_M_TYPE1_MDLEN 16
> +
> +/* NSH Base Header Length */
> +#define NSH_BASE_HDR_LEN  8
> +
> +/* NSH MD Type 1 header Length. */
> +#define NSH_M_TYPE1_LEN   24
> +
> +static inline u16
> +nsh_hdr_len(const struct nsh_hdr *nsh)
> +{
> + return 4 * (ntohs(nsh->ver_flags_len) & NSH_LEN_MASK) >> NSH_LEN_SHIFT;

This is doing the multiplication before the shift. It works only 

Re: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

2017-08-14 Thread Jan Scheurich
> > We have tested the effect of turbo mode on TSC and there is none. The
> TSC frequency remains at the nominal clock speed, no matter if the core is
> clocked down or up. So, I believe for PMD threads (where performance
> matters) TSC would be an adequate and efficient clock.
> 
> It's highly platform dependent and testing on a few systems doesn't
> guarantee anything.
> From the other hand POSIX guarantee the monotonic characteristics for
> CLOCK_MONOTONIC.

TSC is also monotonic on a given core. Does CLOCK_MONOTONIC guarantee any 
better accuracy than TSC for PMD threads?

> > On PMDs I am a bit concerned about the overhead/latency introduced
> with the clock_gettime() system call, but I haven't done any measurements
> to check the actual impact. Have you?
> 
> Have you seen my incremental patches?
> There is no overhead, because we're just replacing 'time_msec' with
> 'time_usec'.
> No difference except converting timespec to usec instead of msec.

I did look at you incremental patches and we will test their performance. I was 
concerned about the system call cost on master already before. Perhaps I'm 
paranoid, but I would like to double check by testing.

> > If we go for CLOCK_MONOTONIC in microsecond resolution, we should
> make sure that the clock is read not more than once once every iteration
> (and cache the us value as now in the pmd ctx struct as suggested in your
> other patch). But then for consistency also the XPS feature should use the
> PMD time in us resolution.
> 
> Again, please, look at my incremental patches.

As far as I could see you did, for example, not consistently adapt 
tx_port->last_used to microsecond resolution.

> > For non-PMD thread we could actually skip time-based output batching
> completely. The packet rates and the frequency of calls to
> dpif_netdev_run() in the main ovs-vswitchd thread are so low that time-
> based flushing doesn't seem to make much sense.

Have you considered this option? 

> >
> > Below you can find an alternative incremental patch on top of your RFC
> 4/4 that uses TSC on PMD. We will be comparing the two alternatives for
> performance both with non-PMD guests (iperf3) as well as PMD guests
> (DPDK testpmd).
> 
> In your version you need to move all the output_batching related code
> under #ifdef DPDK_NETDEV because it will brake userspace networking if
> compiled without dpdk and output-max-latency != 0.

Not sure. Batching should implicitly be disabled because cycles_counter() and 
cycles_per_microsecond() would both return zero. But I agree that would be 
fairly cryptic design. If we used TSC in PMDs we should explicitly not do 
time-based tx batching on the non-PMD thread.

Anyway, if the cost of the clock_gettime() system call proves insignificant and 
our performance tests comparing our TSC-based with your CLOCK_MONOTONIC-based 
implementation show equivalent results, we can go for your approach.

BR, Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

2017-08-14 Thread Ilya Maximets
On 14.08.2017 16:12, Jan Scheurich wrote:
>>> >From earlier in-house trials we know we need to target flush times of 50
>> us or less, so we clearly need better time resolution. Sub-ms timing in PMD
>> should be based on TSC cycles, which are already kept in the pmd struct.
>> Could you provide a corresponding patch for performance testing?
>>
>> I don't think that TSC is suitable in this case. Some reasons:
>>
>> * non-PMD threads are able to float across cpu cores.
>> * Turbo-boost can be enabled or frequency can be adjusted manually after
>> DPDK init.
>> * TSC cycles only calculated if DPDK enabled.
>>
>> TSC is used currently only for not really precise statistics.
>> For the real features we need more accurate time accounting.
>>
>> I believe that CLOCK_MONOTONIC is able to provide at least microsecond
>> granularity on the most of systems. We just need to add one more wrapper
>> function like 'time_usec()' to the lib/timeval.
>>
> 
> We have tested the effect of turbo mode on TSC and there is none. The TSC 
> frequency remains at the nominal clock speed, no matter if the core is 
> clocked down or up. So, I believe for PMD threads (where performance matters) 
> TSC would be an adequate and efficient clock.

It's highly platform dependent and testing on a few systems doesn't guarantee 
anything.
>From the other hand POSIX guarantee the monotonic characteristics for 
>CLOCK_MONOTONIC.

> 
> On PMDs I am a bit concerned about the overhead/latency introduced with the 
> clock_gettime() system call, but I haven't done any measurements to check the 
> actual impact. Have you?

Have you seen my incremental patches?
There is no overhead, because we're just replacing 'time_msec' with 'time_usec'.
No difference except converting timespec to usec instead of msec.

> 
> If we go for CLOCK_MONOTONIC in microsecond resolution, we should make sure 
> that the clock is read not more than once once every iteration (and cache the 
> us value as now in the pmd ctx struct as suggested in your other patch). But 
> then for consistency also the XPS feature should use the PMD time in us 
> resolution.

Again, please, look at my incremental patches.

> 
> For non-PMD thread we could actually skip time-based output batching 
> completely. The packet rates and the frequency of calls to dpif_netdev_run() 
> in the main ovs-vswitchd thread are so low that time-based flushing doesn't 
> seem to make much sense.
> 
> Below you can find an alternative incremental patch on top of your RFC 4/4 
> that uses TSC on PMD. We will be comparing the two alternatives for 
> performance both with non-PMD guests (iperf3) as well as PMD guests (DPDK 
> testpmd).

In your version you need to move all the output_batching related code
under #ifdef DPDK_NETDEV because it will brake userspace networking if
compiled without dpdk and output-max-latency != 0.

> 
> BR, Jan
> 
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0d78ae4..8285786 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -265,7 +265,7 @@ struct dp_netdev {
>  struct hmap ports;
>  struct seq *port_seq;   /* Incremented whenever a port changes. */
>  
> -/* The time that a packet can wait in output batch for sending. */
> +/* Time in cycles that a packet can wait in output batch for sending. */
>  atomic_uint32_t output_max_latency;
>  
>  /* Meters. */
> @@ -508,7 +508,7 @@ struct tx_port {
>  int qid;
>  long long last_used;
>  struct hmap_node node;
> -long long output_time;
> +long long flush_time;   /* Time in LSC cycles to flush output buffer. */
>  struct dp_packet_batch output_pkts;
>  };
>  
> @@ -622,6 +622,7 @@ struct dpif_netdev {
>  uint64_t last_port_seq;
>  };
>  
> +static inline unsigned long cycles_per_microsecond(void);
>  static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no,
>struct dp_netdev_port **portp)
>  OVS_REQUIRES(dp->port_mutex);
> @@ -2921,11 +2922,12 @@ dpif_netdev_set_config(struct dpif *dpif, const 
> struct smap *other_config)
>  uint32_t output_max_latency, cur_max_latency;
>  
>  output_max_latency = smap_get_int(other_config, "output-max-latency",
> -  DEFAULT_OUTPUT_MAX_LATENCY);
> +  DEFAULT_OUTPUT_MAX_LATENCY)
> + * cycles_per_microsecond();
>  atomic_read_relaxed(>output_max_latency, _max_latency);
>  if (output_max_latency != cur_max_latency) {
>  atomic_store_relaxed(>output_max_latency, output_max_latency);
> -VLOG_INFO("Output maximum latency set to %"PRIu32" ms",
> +VLOG_INFO("Output maximum latency set to %"PRIu32" cycles",
>output_max_latency);
>  }
>  
> @@ -3091,6 +3093,16 @@ cycles_counter(void)
>  #endif
>  }
>  
> +static inline unsigned long
> +cycles_per_microsecond(void)
> +{
> +#ifdef DPDK_NETDEV
> +return rte_get_tsc_hz() / 

Re: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

2017-08-14 Thread Jan Scheurich
> >>From earlier in-house trials we know we need to target flush times of 50
> us or less, so we clearly need better time resolution. Sub-ms timing in PMD
> should be based on TSC cycles, which are already kept in the pmd struct.
> Could you provide a corresponding patch for performance testing?
> 
> I don't think that TSC is suitable in this case. Some reasons:
> 
> * non-PMD threads are able to float across cpu cores.
> * Turbo-boost can be enabled or frequency can be adjusted manually after
> DPDK init.
> * TSC cycles only calculated if DPDK enabled.
> 
> TSC is used currently only for not really precise statistics.
> For the real features we need more accurate time accounting.
> 
> I believe that CLOCK_MONOTONIC is able to provide at least microsecond
> granularity on the most of systems. We just need to add one more wrapper
> function like 'time_usec()' to the lib/timeval.
> 

We have tested the effect of turbo mode on TSC and there is none. The TSC 
frequency remains at the nominal clock speed, no matter if the core is clocked 
down or up. So, I believe for PMD threads (where performance matters) TSC would 
be an adequate and efficient clock.

On PMDs I am a bit concerned about the overhead/latency introduced with the 
clock_gettime() system call, but I haven't done any measurements to check the 
actual impact. Have you?

If we go for CLOCK_MONOTONIC in microsecond resolution, we should make sure 
that the clock is read not more than once once every iteration (and cache the 
us value as now in the pmd ctx struct as suggested in your other patch). But 
then for consistency also the XPS feature should use the PMD time in us 
resolution.

For non-PMD thread we could actually skip time-based output batching 
completely. The packet rates and the frequency of calls to dpif_netdev_run() in 
the main ovs-vswitchd thread are so low that time-based flushing doesn't seem 
to make much sense.

Below you can find an alternative incremental patch on top of your RFC 4/4 that 
uses TSC on PMD. We will be comparing the two alternatives for performance both 
with non-PMD guests (iperf3) as well as PMD guests (DPDK testpmd).

BR, Jan


diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0d78ae4..8285786 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -265,7 +265,7 @@ struct dp_netdev {
 struct hmap ports;
 struct seq *port_seq;   /* Incremented whenever a port changes. */
 
-/* The time that a packet can wait in output batch for sending. */
+/* Time in cycles that a packet can wait in output batch for sending. */
 atomic_uint32_t output_max_latency;
 
 /* Meters. */
@@ -508,7 +508,7 @@ struct tx_port {
 int qid;
 long long last_used;
 struct hmap_node node;
-long long output_time;
+long long flush_time;   /* Time in LSC cycles to flush output buffer. */
 struct dp_packet_batch output_pkts;
 };
 
@@ -622,6 +622,7 @@ struct dpif_netdev {
 uint64_t last_port_seq;
 };
 
+static inline unsigned long cycles_per_microsecond(void);
 static int get_port_by_number(struct dp_netdev *dp, odp_port_t port_no,
   struct dp_netdev_port **portp)
 OVS_REQUIRES(dp->port_mutex);
@@ -2921,11 +2922,12 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 uint32_t output_max_latency, cur_max_latency;
 
 output_max_latency = smap_get_int(other_config, "output-max-latency",
-  DEFAULT_OUTPUT_MAX_LATENCY);
+  DEFAULT_OUTPUT_MAX_LATENCY)
+ * cycles_per_microsecond();
 atomic_read_relaxed(>output_max_latency, _max_latency);
 if (output_max_latency != cur_max_latency) {
 atomic_store_relaxed(>output_max_latency, output_max_latency);
-VLOG_INFO("Output maximum latency set to %"PRIu32" ms",
+VLOG_INFO("Output maximum latency set to %"PRIu32" cycles",
   output_max_latency);
 }
 
@@ -3091,6 +3093,16 @@ cycles_counter(void)
 #endif
 }
 
+static inline unsigned long
+cycles_per_microsecond(void)
+{
+#ifdef DPDK_NETDEV
+return rte_get_tsc_hz() / (1000 * 1000);
+#else
+return 0;
+#endif
+}
+
 /* Fake mutex to make sure that the calls to cycles_count_* are balanced */
 extern struct ovs_mutex cycles_counter_fake_mutex;
 
@@ -3171,7 +3183,7 @@ dp_netdev_pmd_flush_output_packets(struct 
dp_netdev_pmd_thread *pmd,
 
 HMAP_FOR_EACH (p, node, >send_port_cache) {
 if (!dp_packet_batch_is_empty(>output_pkts)
-&& (force || p->output_time <= now)) {
+&& (force || p->flush_time <= pmd->last_cycles)) {
 output_cnt += dp_netdev_pmd_flush_output_on_port(pmd, p, now);
 }
 }
@@ -3196,7 +3208,6 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 
 batch_cnt = batch.count;
 dp_netdev_input(pmd, , port_no, now);
-output_cnt = dp_netdev_pmd_flush_output_packets(pmd, now, false);
 } 

Re: [ovs-dev] [PATCH 0/4] Add offload support for action set

2017-08-14 Thread Simon Horman
On Tue, Aug 08, 2017 at 05:21:50PM +0300, Roi Dayan wrote:
> Hi,
> 
> This series adds support for offloading action set using
> tc interface.

Other than some minor nits regarding the 3rd patch, which I posted
in response to that patch, this series looks good to me. And this
is a feature I am very pleased to see.

Acked-by: Simon Horman 

> 
> Thanks,
> Roi
> 
> 
> Paul Blakey (4):
>   compat: Add act_pedit compatibility for old kernels
>   odp-util: Expose ovs flow key attr len table for reuse
>   tc: Add header rewrite using tc pedit action
>   netdev-tc-offloads: Add support for action set

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] tc: Add header rewrite using tc pedit action

2017-08-14 Thread Simon Horman
On Tue, Aug 08, 2017 at 05:21:53PM +0300, Roi Dayan wrote:
> From: Paul Blakey 
> 
> To be later used to implement ovs action set offloading.
> 
> Signed-off-by: Paul Blakey 
> Reviewed-by: Roi Dayan 
> ---
>  lib/tc.c | 372 
> ++-
>  lib/tc.h |  12 +++
>  2 files changed, 381 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/tc.c b/lib/tc.c

...

> @@ -589,6 +758,10 @@ nl_parse_single_action(struct nlattr *action, struct 
> tc_flower *flower)
>  nl_parse_act_vlan(act_options, flower);
>  } else if (!strcmp(act_kind, "tunnel_key")) {
>  nl_parse_act_tunnel_key(act_options, flower);
> +} else if (!strcmp(act_kind, "pedit")) {
> +nl_parse_act_pedit(act_options, flower);
> +} else if (!strcmp(act_kind, "csum")) {
> +/* not doing anything for now */
>  } else {
>  VLOG_ERR_RL(_rl, "unknown tc action kind: %s", act_kind);
>  return EINVAL;
> @@ -790,6 +963,48 @@ tc_get_tc_cls_policy(enum tc_offload_policy policy)
>  }
>  
>  static void
> +nl_msg_put_act_csum(struct ofpbuf *request, uint32_t flags)
> +{
> +size_t offset;
> +
> +nl_msg_put_string(request, TCA_ACT_KIND, "csum");
> +offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
> +{
> +struct tc_csum parm = { .action = TC_ACT_PIPE,
> +.update_flags = flags };

I think it would be a bit nicer to move param to the top of the function
and remove the extra { } scope it is currently inside.

> +
> +nl_msg_put_unspec(request, TCA_CSUM_PARMS, , sizeof parm);
> +}
> +nl_msg_end_nested(request, offset);
> +}
> +
> +static void
> +nl_msg_put_act_pedit(struct ofpbuf *request, struct tc_pedit *parm,
> + struct tc_pedit_key_ex *ex)
> +{
> +size_t ksize = sizeof *parm + (parm->nkeys * sizeof(struct 
> tc_pedit_key));
> +size_t offset, offset_keys_ex, offset_key;
> +int i;
> +
> +nl_msg_put_string(request, TCA_ACT_KIND, "pedit");
> +offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
> +{

The extra { } scope here seems unnecessary.

> +parm->action = TC_ACT_PIPE;
> +
> +nl_msg_put_unspec(request, TCA_PEDIT_PARMS_EX, parm, ksize);
> +offset_keys_ex = nl_msg_start_nested(request, TCA_PEDIT_KEYS_EX);
> +for (i = 0; i < parm->nkeys; i++, ex++) {
> +offset_key = nl_msg_start_nested(request, TCA_PEDIT_KEY_EX);
> +nl_msg_put_u16(request, TCA_PEDIT_KEY_EX_HTYPE, ex->htype);
> +nl_msg_put_u16(request, TCA_PEDIT_KEY_EX_CMD, ex->cmd);
> +nl_msg_end_nested(request, offset_key);
> +}
> +nl_msg_end_nested(request, offset_keys_ex);
> +}
> +nl_msg_end_nested(request, offset);
> +}
> +
> +static void
>  nl_msg_put_act_push_vlan(struct ofpbuf *request, uint16_t vid, uint8_t prio)
>  {
>  size_t offset;

...
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v2 2/3] dpif-netdev: Refactor some pmd stats.

2017-08-14 Thread Ilya Maximets
> The per packets stats are presently overlapping in that
> miss stats include lost stats; make these stats non-overlapping
> for clarity and make this clear in the dp_stat_type enum.  This
> also eliminates the need to increment two 'miss' stats for a
> single packet.
> 
> The subtable lookup stats is renamed to make it
> clear that it relates to masked lookups.
> 
> The stats that total to the number of packets seen are defined
> in dp_stat_type and an api is created to total the stats in case
> these stats are further divided in the future.
> 
> Signed-off-by: Darrell Ball 
> ---
>  lib/dpif-netdev.c | 58 
> ---
>  1 file changed, 38 insertions(+), 20 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 17e1666..38f5203 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -323,12 +323,21 @@ static struct dp_netdev_port 
> *dp_netdev_lookup_port(const struct dp_netdev *dp,
>  OVS_REQUIRES(dp->port_mutex);
>  
>  enum dp_stat_type {
> -DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
> -DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
> -DP_STAT_MISS,   /* Packets that did not match. */
> -DP_STAT_LOST,   /* Packets not passed up to the client. */
> -DP_STAT_LOOKUP_HIT, /* Number of subtable lookups for flow table
> -   hits */
> +DP_STAT_EXACT_HIT,  /* Packets that had an exact match (emc). */
> +DP_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
> +DP_STAT_MISS,   /* Packets that did not match and were passed
> +   up to the client. */

This definition of 'DP_STAT_MISS' looks illogical for me. 'MISS' means
the cache miss (EMC or CLS). But with the new definition the name became
meaningless.
Also, there is some informal concept in OVS: 'execute a miss' which
means the upcall execution. And users expects the miss counter to reflect
the number of upcalls executed.

Maybe we can define something like DP_N_CACHE_STAT to split the cache
(EMC and CLS) hit/miss stats from others?

> +DP_STAT_LOST,   /* Packets that did not match and were not
> +   passed up to client. */
> +DP_N_TOT_PKT_STAT,  /* The above statistics account for the total
> +   number of packets seen and should be
> +   non overlapping with each other. */
> +DP_STAT_MASKED_LOOKUP_HIT = DP_N_TOT_PKT_STAT,  /* Number of subtable
> +   lookups for flow table
> +   hits. Each MASKED_HIT
> +   hit will have >= 1
> +   MASKED_LOOKUP_HIT
> +   hit(s). */

Do we need the '_HIT' prefix for DP_STAT_MASKED_LOOKUP_HIT?
This kind of strange name because we're counting not hits but lookups.


>  DP_N_STATS
>  };
>  
> @@ -749,13 +758,22 @@ enum pmd_info_type {
>  PMD_INFO_SHOW_RXQ /* Show poll-lists of pmd threads. */
>  };
>  
> +static unsigned long long
> +dp_netdev_calcul_total_packets(unsigned long long *stats)
> +{
> +unsigned long long total_packets = 0;
> +for (uint8_t i = 0; i < DP_N_TOT_PKT_STAT; i++) {
> +total_packets += stats[i];
> +}
> +return total_packets;
> +}
> +
>  static void
>  pmd_info_show_stats(struct ds *reply,
>  struct dp_netdev_pmd_thread *pmd,
>  unsigned long long stats[DP_N_STATS],
>  uint64_t cycles[PMD_N_CYCLES])
>  {
> -unsigned long long total_packets;
>  uint64_t total_cycles = 0;
>  int i;
>  
> @@ -773,10 +791,6 @@ pmd_info_show_stats(struct ds *reply,
>  }
>  }
>  
> -/* Sum of all the matched and not matched packets gives the total.  */
> -total_packets = stats[DP_STAT_EXACT_HIT] + stats[DP_STAT_MASKED_HIT]
> -+ stats[DP_STAT_MISS];
> -
>  for (i = 0; i < PMD_N_CYCLES; i++) {
>  if (cycles[i] > pmd->cycles_zero[i]) {
> cycles[i] -= pmd->cycles_zero[i];
> @@ -804,7 +818,8 @@ pmd_info_show_stats(struct ds *reply,
>"\tmiss:%llu\n\tlost:%llu\n",
>stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
>stats[DP_STAT_MASKED_HIT] > 0
> -  ? (1.0*stats[DP_STAT_LOOKUP_HIT])/stats[DP_STAT_MASKED_HIT]
> +  ? (1.0 * stats[DP_STAT_MASKED_LOOKUP_HIT])
> + / stats[DP_STAT_MASKED_HIT]
>: 0,
>stats[DP_STAT_MISS], stats[DP_STAT_LOST]);
>  
> @@ -820,6 +835,9 @@ pmd_info_show_stats(struct ds *reply,
>cycles[PMD_CYCLES_PROCESSING],
>cycles[PMD_CYCLES_PROCESSING] / 

[ovs-dev] [PATCH RFC 2/2] dpif-netdev: Use microseconds granularity for output-max-latency.

2017-08-14 Thread Ilya Maximets
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c| 16 +---
 vswitchd/vswitch.xml |  5 +++--
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0d78ae4..cf1591c 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2825,7 +2825,7 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 struct dp_netdev *dp = get_dp_netdev(dpif);
 struct dp_netdev_pmd_thread *pmd;
 struct dp_packet_batch pp;
-long long now = time_msec();
+long long now = time_usec();
 
 if (dp_packet_size(execute->packet) < ETH_HEADER_LEN ||
 dp_packet_size(execute->packet) > UINT16_MAX) {
@@ -2925,7 +2925,7 @@ dpif_netdev_set_config(struct dpif *dpif, const struct 
smap *other_config)
 atomic_read_relaxed(>output_max_latency, _max_latency);
 if (output_max_latency != cur_max_latency) {
 atomic_store_relaxed(>output_max_latency, output_max_latency);
-VLOG_INFO("Output maximum latency set to %"PRIu32" ms",
+VLOG_INFO("Output maximum latency set to %"PRIu32" us",
   output_max_latency);
 }
 
@@ -3166,7 +3166,7 @@ dp_netdev_pmd_flush_output_packets(struct 
dp_netdev_pmd_thread *pmd,
 }
 
 if (!now) {
-now = time_msec();
+now = time_usec();
 }
 
 HMAP_FOR_EACH (p, node, >send_port_cache) {
@@ -3190,7 +3190,7 @@ dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread 
*pmd,
 dp_packet_batch_init();
 error = netdev_rxq_recv(rx, );
 if (!error) {
-long long now = time_msec();
+long long now = time_usec();
 
 *recirc_depth_get() = 0;
 
@@ -3768,7 +3768,7 @@ dpif_netdev_run(struct dpif *dpif)
 }
 
 cycles_count_end(non_pmd, PMD_CYCLES_IDLE);
-dpif_netdev_xps_revalidate_pmd(non_pmd, time_msec(), false);
+dpif_netdev_xps_revalidate_pmd(non_pmd, time_usec(), false);
 ovs_mutex_unlock(>non_pmd_mutex);
 
 dp_netdev_pmd_unref(non_pmd);
@@ -4742,7 +4742,7 @@ packet_batch_per_flow_execute(struct 
packet_batch_per_flow *batch,
 struct dp_netdev_flow *flow = batch->flow;
 
 dp_netdev_flow_used(flow, batch->array.count, batch->byte_count,
-batch->tcp_flags, now);
+batch->tcp_flags, now / 1000);
 
 actions = dp_netdev_flow_get_actions(flow);
 
@@ -5111,7 +5111,7 @@ dpif_netdev_xps_revalidate_pmd(const struct 
dp_netdev_pmd_thread *pmd,
 if (!tx->port->dynamic_txqs) {
 continue;
 }
-interval = now - tx->last_used;
+interval = now / 1000 - tx->last_used;
 if (tx->qid >= 0 && (purge || interval >= XPS_TIMEOUT_MS)) {
 port = tx->port;
 ovs_mutex_lock(>txq_used_mutex);
@@ -5132,6 +5132,8 @@ dpif_netdev_xps_get_tx_qid(const struct 
dp_netdev_pmd_thread *pmd,
 
 if (OVS_UNLIKELY(!now)) {
 now = time_msec();
+} else {
+now /= 1000;
 }
 
 interval = now - tx->last_used;
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 23930f0..1c6ae7c 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -345,9 +345,10 @@
   
 
   
+  type='{"type": "integer",
+ "minInteger": 0, "maxInteger": 100}'>
 
-  Specifies the time in milliseconds that a packet can wait in output
+  Specifies the time in microseconds that a packet can wait in output
   batch for sending i.e. amount of time that packet can spend in an
   intermediate output queue before sending to netdev.
   This option can be used to configure balance between throughput
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH RFC 1/2] timeval: Introduce time_usec().

2017-08-14 Thread Ilya Maximets
This function will provide monotonic time in microseconds.

Signed-off-by: Ilya Maximets 
---
 lib/timeval.c | 22 ++
 lib/timeval.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/lib/timeval.c b/lib/timeval.c
index dd63f03..be2eddc 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -233,6 +233,22 @@ time_wall_msec(void)
 return time_msec__(_clock);
 }
 
+static long long int
+time_usec__(struct clock *c)
+{
+struct timespec ts;
+
+time_timespec__(c, );
+return timespec_to_usec();
+}
+
+/* Returns a monotonic timer, in microseconds. */
+long long int
+time_usec(void)
+{
+return time_usec__(_clock);
+}
+
 /* Configures the program to die with SIGALRM 'secs' seconds from now, if
  * 'secs' is nonzero, or disables the feature if 'secs' is zero. */
 void
@@ -360,6 +376,12 @@ timeval_to_msec(const struct timeval *tv)
 return (long long int) tv->tv_sec * 1000 + tv->tv_usec / 1000;
 }
 
+long long int
+timespec_to_usec(const struct timespec *ts)
+{
+return (long long int) ts->tv_sec * 1000 * 1000 + ts->tv_nsec / 1000;
+}
+
 /* Returns the monotonic time at which the "time" module was initialized, in
  * milliseconds. */
 long long int
diff --git a/lib/timeval.h b/lib/timeval.h
index 7957dad..8e74551 100644
--- a/lib/timeval.h
+++ b/lib/timeval.h
@@ -54,6 +54,7 @@ time_t time_now(void);
 time_t time_wall(void);
 long long int time_msec(void);
 long long int time_wall_msec(void);
+long long int time_usec(void);
 void time_timespec(struct timespec *);
 void time_wall_timespec(struct timespec *);
 void time_alarm(unsigned int secs);
@@ -62,6 +63,7 @@ int time_poll(struct pollfd *, int n_pollfds, HANDLE *handles,
 
 long long int timespec_to_msec(const struct timespec *);
 long long int timeval_to_msec(const struct timeval *);
+long long int timespec_to_usec(const struct timespec *);
 
 struct tm_msec *localtime_msec(long long int now, struct tm_msec *result);
 struct tm_msec *gmtime_msec(long long int now, struct tm_msec *result);
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

2017-08-14 Thread Ilya Maximets
On 14.08.2017 02:33, Jan Scheurich wrote:
>> This allows to collect packets from more than one RX burst
>> and send them together with a configurable maximum latency.
>>
>> 'other_config:output-max-latency' can be used to configure
>> time that a packet can wait in output batch for sending.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>> Notes:
>>
>> * This is an RFC and should not be used for performance testing.
>> * Millisecond granularity is used for now. Can be easily switched
>>   to use microseconds instead.
> 
>>From earlier in-house trials we know we need to target flush times of 50 us 
>>or less, so we clearly need better time resolution. Sub-ms timing in PMD 
>>should be based on TSC cycles, which are already kept in the pmd struct. 
>>Could you provide a corresponding patch for performance testing?

I don't think that TSC is suitable in this case. Some reasons:

* non-PMD threads are able to float across cpu cores.
* Turbo-boost can be enabled or frequency can be adjusted manually after DPDK 
init.
* TSC cycles only calculated if DPDK enabled.

TSC is used currently only for not really precise statistics.
For the real features we need more accurate time accounting.

I believe that CLOCK_MONOTONIC is able to provide at least microsecond
granularity on the most of systems. We just need to add one more wrapper
function like 'time_usec()' to the lib/timeval.

I'll send corresponding WIP incremental patches in reply to this message.

> 
>>
>>  lib/dpif-netdev.c| 121
>> ++-
>>  vswitchd/vswitch.xml |  15 +++
>>  2 files changed, 115 insertions(+), 21 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index dcf55f3..0d78ae4 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -85,6 +85,9 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>>  #define MAX_RECIRC_DEPTH 5
>>  DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
>>
>> +/* Use instant packet send by default. */
>> +#define DEFAULT_OUTPUT_MAX_LATENCY 0
>> +
>>  /* Configuration parameters. */
>>  enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow
>> table. */
>>  enum { MAX_METERS = 65536 };/* Maximum number of meters. */
>> @@ -262,6 +265,9 @@ struct dp_netdev {
>>  struct hmap ports;
>>  struct seq *port_seq;   /* Incremented whenever a port changes. */
>>
>> +/* The time that a packet can wait in output batch for sending. */
>> +atomic_uint32_t output_max_latency;
>> +
>>  /* Meters. */
>>  struct ovs_mutex meter_locks[N_METER_LOCKS];
>>  struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
>> @@ -502,6 +508,7 @@ struct tx_port {
>>  int qid;
>>  long long last_used;
>>  struct hmap_node node;
>> +long long output_time;
> 
> Rename to flush_time?

Maybe. Why do you think it's better?

> 
>>  struct dp_packet_batch output_pkts;
>>  };
>>
>> @@ -574,6 +581,9 @@ struct dp_netdev_pmd_thread {
>>   * than 'cmap_count(dp->poll_threads)'. */
>>  uint32_t static_tx_qid;
>>
>> +/* Number of filled output batches. */
>> +int n_output_batches;
>> +
>>  struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'.
>> */
>>  /* List of rx queues to poll. */
>>  struct hmap poll_list OVS_GUARDED;
>> @@ -669,9 +679,9 @@ static void dp_netdev_add_rxq_to_pmd(struct
>> dp_netdev_pmd_thread *pmd,
>>  static void dp_netdev_del_rxq_from_pmd(struct dp_netdev_pmd_thread
>> *pmd,
>> struct rxq_poll *poll)
>>  OVS_REQUIRES(pmd->port_mutex);
>> -static void
>> +static int
>>  dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread
>> *pmd,
>> -   long long now);
>> +   long long now, bool force);
>>  static void reconfigure_datapath(struct dp_netdev *dp)
>>  OVS_REQUIRES(dp->port_mutex);
>>  static bool dp_netdev_pmd_try_ref(struct dp_netdev_pmd_thread *pmd);
>> @@ -1193,6 +1203,7 @@ create_dp_netdev(const char *name, const
>> struct dpif_class *class,
>>  conntrack_init(>conntrack);
>>
>>  atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
>> +atomic_init(>output_max_latency,
>> DEFAULT_OUTPUT_MAX_LATENCY);
>>
>>  cmap_init(>poll_threads);
>>
>> @@ -2858,7 +2869,7 @@ dpif_netdev_execute(struct dpif *dpif, struct
>> dpif_execute *execute)
>>  dp_packet_batch_init_packet(, execute->packet);
>>  dp_netdev_execute_actions(pmd, , false, execute->flow,
>>execute->actions, execute->actions_len, now);
>> -dp_netdev_pmd_flush_output_packets(pmd, now);
>> +dp_netdev_pmd_flush_output_packets(pmd, now, true);
>>
>>  if (pmd->core_id == NON_PMD_CORE_ID) {
>>  ovs_mutex_unlock(>non_pmd_mutex);
>> @@ -2907,6 +2918,16 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>  smap_get_ullong(other_config, 

Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Monday, 14 August, 2017 12:48
> 
> On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote:
> > Is it worth to speculate on how a hypothetical future NSH version
> > (with a different Version value in the Base header) might look like?
> 
> Absolutely. This is uAPI we're talking about and once merged, it's set
> in stone. Whatever we come up with should allow future extensibility.
> 
> > If this should ever arise, we could introduce a new push_nsh_v2
> > action.
> 
> Which would mean we failed with the design. We would have to maintain
> two parallel APIs forever. This is not how the design should be made.

Well, if that is the ultimate design goal, we'll have no choice.

But if the hypothetic next NSH version looks entirely different, we will have 
to manage the incompatibility anyhow on the level of the nested attributes. So 
the only thing we will for sure manage to keep fixed might be the empty wrapper 
OVS_ACTION_ATTR_PUSH_NSH...

We decided earlier on to go for dedicated push/pop_ actions in the 
datapath instead of a single pair of (very polymorphic) generic encap/decap 
datapath actions to make the implementation in the datapath more explicit and 
straightforward. Why not follow the same philosophy also when we need to 
support push/pop for incompatible versions of the same protocol?

Jan

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jiri Benc
On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote:
> Is it worth to speculate on how a hypothetical future NSH version
> (with a different Version value in the Base header) might look like?

Absolutely. This is uAPI we're talking about and once merged, it's set
in stone. Whatever we come up with should allow future extensibility.

> If this should ever arise, we could introduce a new push_nsh_v2
> action.

Which would mean we failed with the design. We would have to maintain
two parallel APIs forever. This is not how the design should be made.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Monday, 14 August, 2017 09:51
> 
> On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote:
> > Jiri, I am not too familiar with conventions on the OVS netlink
> > interface regarding the handling of variable length fields. What is
> > the benefit of structuring the push_nsh action into
> >
> > OVS_ACTION_ATTR_PUSH_NSH
> > +-- OVS_ACTION_ATTR_NSH_BASE_HEADER
> > +-- OVS_ACTION_ATTR_NSH_MD
> >
> > instead of grouping the base header fields and the variable length MD
> > into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the
> > main concern a potential future need to add additional parameters to
> > the push_nsh datapath action? Are there examples for such structured
> > actions other than OVS_ACTION_ATTR_CT where the need is obvious
> > because it is very polymorphic?
> 
> This is about having the design clean. What would you do if you had two
> variable length fields? Would you still put them in a single structure
> and have a length field in the structure, too? That would be wrong, we
> have length in the attribute header. I doubt you would do that. Which
> indicates that putting variable length fields to an attribute with
> anything other is wrong.
> 
> Also, look at how ugly the code would be. You'd have to subtract the
> base header length from the attribute length to get the variable data
> length. That's not nice at all.
> 
> Think about the netlink in the way that by default, every field should
> have its own attribute. Structures are included only for performance
> reasons where certain fields are always passed in a message and having
> them in separate attributes would be impractical and waste of space.
> Going out of your way to include the context data in the structure thus
> doesn't make sense.
> 
> > BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading
> because
> > in the NSH draft the term base header is used for the first 32-bit
> > word, whereas here it includes also the 32-bit Service Path header.
> 
> An excellent comment. This means that it's very well possible that
> future NSH versions may not include SP header or may have it of a
> different size. Maybe we should consider putting it into a separate
> attribute, too? Not sure it is needed, though, I don't think it's
> likely to happen.

I think the NSH RFC draft is pretty clear that the NSH header (Version 0) will 
always contain the Base Header, the Service Header and a (possibly empty) set 
of Context Headers. The length of the context headers is implied by the total 
NSH header Length in the Base Header. The internal structure of the context 
headers, implied by the "MD type" in the base header, is irrelevant for the 
push_nsh action as it is managed by the ofproto-dpif layer. The current NSH 
header format simply does not allow for a second variable length section in the 
header.

Is it worth to speculate on how a hypothetical future NSH version (with a 
different Version value in the Base header) might look like? If this should 
ever arise, we could introduce a new push_nsh_v2 action. 

In summary, I'm still not convinced that we gain any significant generality by 
splitting up the OVS_ACTION_ATTR_PUSH_NSH action into nested attributes. It 
will only slow down the execution of the action in the datapath (or requires 
some optimization in the datapath to merge the attributes into a single 
internal action struct).

Note: For the OVS_KEY_ATTR_NSH we should split up in to 
OVS_KEY_ATTR_NSH_FIXED_HEADER (covering Base and Service headers) and 
OVS_KEY_ATTR_NSH_MD1.

Jan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Stokes, Ian
: RE: [ovs-dev] OVS 2.9 Intel Roadmap
> 
> What about Billy's patch set for Rx queue prioritization?
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336001.html
> Jan
> 
Good catch Jan, I missed that myself (Billy was out of office last week when 
putting the list together). I'll add this to the list.

Thanks
Ian
> > -Original Message-
> > From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> > boun...@openvswitch.org] On Behalf Of Stokes, Ian
> > Sent: Monday, 14 August, 2017 10:45
> > To: Darrell Ball ; ovs-dev@openvswitch.org
> > Subject: Re: [ovs-dev] OVS 2.9 Intel Roadmap
> >
> > > Any prioritization or other qos features missing ?
> > >
> > Hi Darrell,
> >
> > Do you mean the 2 qos patchsets on the list currently (support for
> > queues and move qos in do_tx_copy?).
> >
> > If so, I didn’t include these in the list as Intel were not the
> > authors. I do plan to provide feedback on them though but the last few
> > weeks have been a bit hectic on my side.
> >
> > Are there any other QoS features you are referring to that are missing?
> >
> > Ian
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Jan Scheurich
What about Billy's patch set for Rx queue prioritization?
https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/336001.html
Jan

> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of Stokes, Ian
> Sent: Monday, 14 August, 2017 10:45
> To: Darrell Ball ; ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] OVS 2.9 Intel Roadmap
> 
> > Any prioritization or other qos features missing ?
> >
> Hi Darrell,
> 
> Do you mean the 2 qos patchsets on the list currently (support for queues
> and move qos in do_tx_copy?).
> 
> If so, I didn’t include these in the list as Intel were not the authors. I do 
> plan
> to provide feedback on them though but the last few weeks have been a bit
> hectic on my side.
> 
> Are there any other QoS features you are referring to that are missing?
> 
> Ian
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Stokes, Ian
> Any prioritization or other qos features missing ?
> 
Hi Darrell,

Do you mean the 2 qos patchsets on the list currently (support for queues and 
move qos in do_tx_copy?).

If so, I didn’t include these in the list as Intel were not the authors. I do 
plan to provide feedback on them though but the last few weeks have been a bit 
hectic on my side.

Are there any other QoS features you are referring to that are missing?

Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7] Add offload support for ip ttl and tcp flags

2017-08-14 Thread Simon Horman
On Fri, Aug 11, 2017 at 11:43:23AM -0700, Joe Stringer wrote:
> On 8 August 2017 at 01:03, Simon Horman  wrote:
> > On Mon, Aug 07, 2017 at 06:19:04PM +0300, Roi Dayan wrote:
> >> Hi,
> >>
> >> This series adds support for offloading ip ttl and tcp flags
> >> using tc interface.
> >
> > This looks nice, thanks.
> >
> > Acked-by: Simon Horman 
> >
> > I'm also happy to apply this if someone else provides a review.
> 
> Thanks folks, I'll apply the series to master soon with Simon's acks
> and the following style fixup.

Thanks.

> diff --git a/lib/tc.c b/lib/tc.c
> index bf12a5bea494..c9cada249de3 100644
> --- a/lib/tc.c
> +++ b/lib/tc.c
> @@ -363,7 +363,6 @@ nl_parse_flower_ip(struct nlattr **attrs, struct
> tc_flower *flower) {
>  key->ip_ttl = nl_attr_get_u8(attrs[TCA_FLOWER_KEY_IP_TTL]);
>  mask->ip_ttl = nl_attr_get_u8(attrs[TCA_FLOWER_KEY_IP_TTL_MASK]);
>  }
> -
>  }
> 
>  static const struct nl_policy tunnel_key_policy[] = {
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jiri Benc
On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote:
> Jiri, I am not too familiar with conventions on the OVS netlink
> interface regarding the handling of variable length fields. What is
> the benefit of structuring the push_nsh action into
> 
> OVS_ACTION_ATTR_PUSH_NSH
> +-- OVS_ACTION_ATTR_NSH_BASE_HEADER
> +-- OVS_ACTION_ATTR_NSH_MD
> 
> instead of grouping the base header fields and the variable length MD
> into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the
> main concern a potential future need to add additional parameters to
> the push_nsh datapath action? Are there examples for such structured
> actions other than OVS_ACTION_ATTR_CT where the need is obvious
> because it is very polymorphic?

This is about having the design clean. What would you do if you had two
variable length fields? Would you still put them in a single structure
and have a length field in the structure, too? That would be wrong, we
have length in the attribute header. I doubt you would do that. Which
indicates that putting variable length fields to an attribute with
anything other is wrong.

Also, look at how ugly the code would be. You'd have to subtract the
base header length from the attribute length to get the variable data
length. That's not nice at all.

Think about the netlink in the way that by default, every field should
have its own attribute. Structures are included only for performance
reasons where certain fields are always passed in a message and having
them in separate attributes would be impractical and waste of space.
Going out of your way to include the context data in the structure thus
doesn't make sense.

> BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because
> in the NSH draft the term base header is used for the first 32-bit
> word, whereas here it includes also the 32-bit Service Path header.

An excellent comment. This means that it's very well possible that
future NSH versions may not include SP header or may have it of a
different size. Maybe we should consider putting it into a separate
attribute, too? Not sure it is needed, though, I don't think it's
likely to happen.

 Jiri
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 2/3] ovn-controller: Add a new action - 'put_nd_ra_opts'

2017-08-14 Thread nusiddiq
From: Numan Siddique 

This patch adds a new OVN action 'put_nd_ra_opts' to support native
IPv6 Router Advertisement in OVN. This action can be used to respond
to the IPv6 Router Solicitation requests.

ovn-controller parses this action and adds a NXT_PACKET_IN2 OF flow
with 'pause' flag set and the RA options stored in 'userdata' field.
This action is similar to 'put_dhcp_opts' and 'put_dhcpv6_opts'.

When a valid IPv6 RS packet is received by the pinctrl module of
ovn-controller, it frames a new RA packet and sets the RA options
from the 'userdata' field and resumes the packet storing 1 in the
1-bit result sub-field. If the packet is invalid, it resumes the
packet without any modifications storing 0 in the 1-bit result
sub-field.

Eg. reg0[5] = put_nd_ra_opts(address_mode = "slaac", mtu = 1450,
 slla = 01:02:03:04:05:06, prefix = aef0::/64)

Note that unlike DHCPv4/v6, a new table to store the supported IPv6 ND RA
options is not added in SB DB since there are only 3 ND RA options.

Co-authored-by: Zongkai LI 
Signed-off-by: Zongkai LI 
Signed-off-by: Numan Siddique 
---
 include/ovn/actions.h |  14 +++-
 ovn/controller/lflow.c|  12 ++-
 ovn/controller/pinctrl.c  |  96 +++
 ovn/lib/actions.c | 194 +-
 ovn/lib/ovn-l7.h  |  48 
 ovn/ovn-sb.xml|  77 ++
 ovn/utilities/ovn-trace.c |  46 +++
 tests/ovn.at  |  31 
 tests/test-ovn.c  |  13 +++-
 9 files changed, 511 insertions(+), 20 deletions(-)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index d13a3747b..15cee478d 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -72,7 +72,8 @@ struct simap;
 OVNACT(PUT_DHCPV6_OPTS,   ovnact_put_opts)\
 OVNACT(SET_QUEUE, ovnact_set_queue)   \
 OVNACT(DNS_LOOKUP,ovnact_dns_lookup)  \
-OVNACT(LOG,   ovnact_log)
+OVNACT(LOG,   ovnact_log) \
+OVNACT(PUT_ND_RA_OPTS,ovnact_put_opts)
 
 /* enum ovnact_type, with a member OVNACT_ for each action. */
 enum OVS_PACKED_ENUM ovnact_type {
@@ -418,6 +419,14 @@ enum action_opcode {
  *   - A variable length string containing the name.
  */
 ACTION_OPCODE_LOG,
+
+/* "result = put_nd_ra_opts(option, ...)".
+ * Arguments follow the action_header, in this format:
+ *   - A 32-bit or 64-bit OXM header designating the result field.
+ *   - A 32-bit integer specifying a bit offset within the result field.
+ *   - Any number of ICMPv6 options.
+ */
+ACTION_OPCODE_PUT_ND_RA_OPTS,
 };
 
 /* Header. */
@@ -438,6 +447,9 @@ struct ovnact_parse_params {
 /* hmap of 'struct gen_opts_map'  to support 'put_dhcpv6_opts' action */
 const struct hmap *dhcpv6_opts;
 
+/* hmap of 'struct gen_opts_map' to support 'put_nd_ra_opts' action */
+const struct hmap *nd_ra_opts;
+
 /* Each OVN flow exists in a logical table within a logical pipeline.
  * These parameters express this context for a set of OVN actions being
  * parsed:
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 875d326e0..9b53519dd 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -65,6 +65,7 @@ static void consider_logical_flow(struct controller_ctx *ctx,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
+  struct hmap *nd_ra_opts,
   uint32_t *conj_id_ofs,
   const struct shash *addr_sets,
   struct hmap *flow_table,
@@ -165,16 +166,21 @@ add_logical_flows(struct controller_ctx *ctx,
 dhcpv6_opt_row->type);
 }
 
+struct hmap nd_ra_opts = HMAP_INITIALIZER(_ra_opts);
+nd_ra_opts_init(_ra_opts);
+
 SBREC_LOGICAL_FLOW_FOR_EACH (lflow, ctx->ovnsb_idl) {
 consider_logical_flow(ctx, chassis_index,
   lflow, local_datapaths,
   group_table, chassis,
-  _opts, _opts, _id_ofs,
-  addr_sets, flow_table, active_tunnels);
+  _opts, _opts, _ra_opts,
+  _id_ofs, addr_sets, flow_table,
+  active_tunnels);
 }
 
 dhcp_opts_destroy(_opts);
 dhcp_opts_destroy(_opts);
+nd_ra_opts_destroy(_ra_opts);
 }
 
 static void
@@ -186,6 +192,7 @@ consider_logical_flow(struct controller_ctx *ctx,
   const struct sbrec_chassis *chassis,
   struct hmap *dhcp_opts,
   struct hmap *dhcpv6_opts,
+   

[ovs-dev] [PATCH v6 1/3] ovn util: Refactor dhcp_opts_map to make it generic

2017-08-14 Thread nusiddiq
From: Numan Siddique 

Renamed 'struct dhcp_opts_map' to 'struct gen_opts_map' and
renamed ovn-dhcp.h to ovn-l7.h. An upcoming commit to support IPv6
Router Advertisement, will make use of the refactored code to store
the IPv6 ND RA options in 'struct gen_opts_map'.

Signed-off-by: Numan Siddique 
Acked-by: Miguel Angel Ajo 
---
 include/ovn/actions.h|  16 +++
 ovn/controller/lflow.c   |   2 +-
 ovn/controller/pinctrl.c |   2 +-
 ovn/lib/actions.c| 100 ++-
 ovn/lib/automake.mk  |   2 +-
 ovn/lib/{ovn-dhcp.h => ovn-l7.h} |  67 ++
 ovn/northd/ovn-northd.c  |  14 +++---
 ovn/utilities/ovn-trace.c|  10 ++--
 tests/test-ovn.c |   3 +-
 9 files changed, 127 insertions(+), 89 deletions(-)
 rename ovn/lib/{ovn-dhcp.h => ovn-l7.h} (78%)

diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 0a04af7aa..d13a3747b 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -68,8 +68,8 @@ struct simap;
 OVNACT(PUT_ARP,   ovnact_put_mac_bind)\
 OVNACT(GET_ND,ovnact_get_mac_bind)\
 OVNACT(PUT_ND,ovnact_put_mac_bind)\
-OVNACT(PUT_DHCPV4_OPTS,   ovnact_put_dhcp_opts)   \
-OVNACT(PUT_DHCPV6_OPTS,   ovnact_put_dhcp_opts)   \
+OVNACT(PUT_DHCPV4_OPTS,   ovnact_put_opts)\
+OVNACT(PUT_DHCPV6_OPTS,   ovnact_put_opts)\
 OVNACT(SET_QUEUE, ovnact_set_queue)   \
 OVNACT(DNS_LOOKUP,ovnact_dns_lookup)  \
 OVNACT(LOG,   ovnact_log)
@@ -234,16 +234,16 @@ struct ovnact_put_mac_bind {
 struct expr_field mac;  /* 48-bit Ethernet address. */
 };
 
-struct ovnact_dhcp_option {
-const struct dhcp_opts_map *option;
+struct ovnact_gen_option {
+const struct gen_opts_map *option;
 struct expr_constant_set value;
 };
 
 /* OVNACT_PUT_DHCPV4_OPTS, OVNACT_PUT_DHCPV6_OPTS. */
-struct ovnact_put_dhcp_opts {
+struct ovnact_put_opts {
 struct ovnact ovnact;
 struct expr_field dst;  /* 1-bit destination field. */
-struct ovnact_dhcp_option *options;
+struct ovnact_gen_option *options;
 size_t n_options;
 };
 
@@ -432,10 +432,10 @@ struct ovnact_parse_params {
  * expr_parse()). */
 const struct shash *symtab;
 
-/* hmap of 'struct dhcp_opts_map' to support 'put_dhcp_opts' action */
+/* hmap of 'struct gen_opts_map' to support 'put_dhcp_opts' action */
 const struct hmap *dhcp_opts;
 
-/* hmap of 'struct dhcp_opts_map'  to support 'put_dhcpv6_opts' action */
+/* hmap of 'struct gen_opts_map'  to support 'put_dhcpv6_opts' action */
 const struct hmap *dhcpv6_opts;
 
 /* Each OVN flow exists in a logical table within a logical pipeline.
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 6d9f02cb2..875d326e0 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -25,7 +25,7 @@
 #include "ovn-controller.h"
 #include "ovn/actions.h"
 #include "ovn/expr.h"
-#include "ovn/lib/ovn-dhcp.h"
+#include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-sb-idl.h"
 #include "packets.h"
 #include "physical.h"
diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
index 469a35586..43e3cba23 100644
--- a/ovn/controller/pinctrl.c
+++ b/ovn/controller/pinctrl.c
@@ -41,7 +41,7 @@
 #include "ovn/lex.h"
 #include "ovn/lib/acl-log.h"
 #include "ovn/lib/logical-fields.h"
-#include "ovn/lib/ovn-dhcp.h"
+#include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-util.h"
 #include "poll-loop.h"
 #include "rconn.h"
diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
index d0d73b69c..b2559cd07 100644
--- a/ovn/lib/actions.c
+++ b/ovn/lib/actions.c
@@ -20,7 +20,7 @@
 #include "bitmap.h"
 #include "byte-order.h"
 #include "compiler.h"
-#include "ovn-dhcp.h"
+#include "ovn-l7.h"
 #include "hash.h"
 #include "logical-fields.h"
 #include "nx-match.h"
@@ -1374,19 +1374,17 @@ ovnact_put_mac_bind_free(struct ovnact_put_mac_bind 
*put_mac OVS_UNUSED)
 }
 
 static void
-parse_dhcp_opt(struct action_context *ctx, struct ovnact_dhcp_option *o,
-   bool v6)
+parse_gen_opt(struct action_context *ctx, struct ovnact_gen_option *o,
+  const struct hmap *gen_opts, const char *opts_type)
 {
 if (ctx->lexer->token.type != LEX_T_ID) {
 lexer_syntax_error(ctx->lexer, NULL);
 return;
 }
 
-const char *name = v6 ? "DHCPv6" : "DHCPv4";
-const struct hmap *map = v6 ? ctx->pp->dhcpv6_opts : ctx->pp->dhcp_opts;
-o->option = map ? dhcp_opts_find(map, ctx->lexer->token.s) : NULL;
+o->option = gen_opts ? gen_opts_find(gen_opts, ctx->lexer->token.s) : NULL;
 if (!o->option) {
-lexer_syntax_error(ctx->lexer, "expecting %s option name", name);
+lexer_syntax_error(ctx->lexer, "expecting %s option name", opts_type);
 return;
 }
 lexer_get(ctx->lexer);
@@ 

[ovs-dev] [PATCH v6 0/3] Add IPv6 Router Solicitation responder support

2017-08-14 Thread nusiddiq
From: Numan Siddique 


v5 -> v6

Addressed review comments in p2. 'put_nd_ra_opts' action now encodes the ICMPv6 
RA
complete data in the 'userdata' field of the controller action.

v4 -> v5
---
There were some unrelated changes which cropped in when rebasing in v4 p1.
Corrected it in v5.

v3 -> v4
---
Resolved the merge conflicts and rebased it.
Patch 3 - Addressed review comments from v3

v2 -> v3

Fixed the checkpatch errors and review comments from v2

v1 -> v2
---

The patches p1 and p2 of v1 are merged. p3 of v1 is dropped.

v1
--

This patch series supports IPv6 Router solicitation responder support.
This patch series picks up from the patches from here [1] and reset the version
to 1 since few of the patches were accepted.

In this new series continuations are used to support this feature.
When IPv6 RS packet is received, it is sent to ovn-controller which applies
the action 'put_nd_ra_opts' to transform it to an IPv6 RA packet and resumes
the packet. ovn-northd adds the necessary flows to respond the RA packet
back to the originating port.

[1] - https://mail.openvswitch.org/pipermail/ovs-dev/2017-May/332008.html


Numan Siddique (2):
  ovn util: Refactor dhcp_opts_map to make it generic
  ovn-controller: Add a new action - 'put_nd_ra_opts'

Zong Kai LI (1):
  ovn-northd: Add logical flows to support native IPv6 RA

 include/ovn/actions.h|  30 ++--
 ovn/controller/lflow.c   |  14 +-
 ovn/controller/pinctrl.c |  98 -
 ovn/lib/actions.c| 294 +--
 ovn/lib/automake.mk  |   2 +-
 ovn/lib/logical-fields.c |   4 +
 ovn/lib/{ovn-dhcp.h => ovn-l7.h} | 115 ---
 ovn/northd/ovn-northd.8.xml  |  83 ++-
 ovn/northd/ovn-northd.c  | 151 
 ovn/ovn-nb.ovsschema |   7 +-
 ovn/ovn-nb.xml   |  39 ++
 ovn/ovn-sb.xml   |  81 +++
 ovn/utilities/ovn-trace.c|  52 ---
 tests/ovn.at | 280 +
 tests/test-ovn.c |  16 ++-
 15 files changed, 1136 insertions(+), 130 deletions(-)
 rename ovn/lib/{ovn-dhcp.h => ovn-l7.h} (63%)

-- 
2.13.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v4] route-table: Remove netdevs in netdev_hash when deleted

2017-08-14 Thread fukaige
From: Kaige Fu 

Start a virtual machine with its backend tap device attached to a brought up 
linux bridge.
If we delete the linux bridge when vm is still running, we'll get the following 
error when
trying to create a ovs bridge with the same name.

The reason is that ovs-router subsystem add the linux bridge into netdev_shash, 
but does
not remove it when the bridge is deleted in the situation. When the bridge is 
deleted, ovs
will receive a RTM_DELLINK msg, take this chance to remove the bridge in 
netdev_shash.

ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log for 
details.

ovs-vswitchd log:
2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports recirculation
2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system: MPLS label 
stack length probed as 1
2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports unique flow ids
2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_state
2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_zone
2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_mark
2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_label
2017-05-11T03:45:25.364Z|1|ofproto_dpif_upcall(handler226)|INFO|received 
packet on unassociated datapath port 0
2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command ETHTOOL_GFLAGS 
on network device br-eth failed: No such device
2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed to add 
br-eth as port: No such device
2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using datapath ID 
2a51cf9f2841
2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service controller 
"punix:/var/run/openvswitch/br-eth.mgmt"

Signed-off-by: Kaige Fu 
---
 lib/route-table.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/route-table.c b/lib/route-table.c
index 67fda31..4e40c37 100644
--- a/lib/route-table.c
+++ b/lib/route-table.c
@@ -33,6 +33,7 @@
 #include "ovs-router.h"
 #include "packets.h"
 #include "rtnetlink.h"
+#include "tnl-ports.h"
 #include "openvswitch/vlog.h"
 
 /* Linux 2.6.36 added RTA_MARK, so define it just in case we're building with
@@ -333,10 +334,16 @@ name_table_init(void)
 
 
 static void
-name_table_change(const struct rtnetlink_change *change OVS_UNUSED,
+name_table_change(const struct rtnetlink_change *change,
   void *aux OVS_UNUSED)
 {
 /* Changes to interface status can cause routing table changes that some
  * versions of the linux kernel do not advertise for some reason. */
 route_table_valid = false;
+
+if (change && change->nlmsg_type == RTM_DELLINK) {
+if (change->ifname) {
+tnl_port_map_delete_ipdev(change->ifname);
+}
+}
 }
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] route-table: Remove netdevs in netdev_hash when deleted

2017-08-14 Thread fukaige


> -Original Message-
> From: Tonghao Zhang [mailto:xiangxia.m@gmail.com]
> Sent: Thursday, August 10, 2017 9:23 AM
> To: fukaige
> Cc: ovs dev
> Subject: Re: [ovs-dev] [PATCH v3] route-table: Remove netdevs in netdev_hash
> when deleted
> 
> why not remove the function of route_table_link_del to name_table_change() ?
> 

Thanks for review. Seems it is a better way to use function of 
name_table_change() instead of
rewriting a new function. 
I will modify it in v4 patch.

> On Wed, Aug 9, 2017 at 3:41 PM, fukaige  wrote:
> > From: Kaige Fu 
> >
> > Start a virtual machine with its backend tap device attached to a brought up
> linux bridge.
> > If we delete the linux bridge when vm is still running, we'll get the
> > following error when trying to create a ovs bridge with the same name.
> >
> > The reason is that ovs-router subsystem add the linux bridge into
> > netdev_shash, but does not remove it when the bridge is deleted in the
> > situation. When the bridge is deleted, ovs will receive a RTM_DELLINK msg,
> take this chance to remove the bridge in netdev_shash.
> >
> > ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log 
> > for
> details.
> >
> > ovs-vswitchd log:
> > 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports recirculation
> > 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system@ovs-system:
> > MPLS label stack length probed as 1
> > 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports unique flow ids
> > 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_state
> > 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_zone
> > 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_mark
> > 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system@ovs-system:
> > Datapath supports ct_label
> > 2017-05-11T03:45:25.364Z|1|ofproto_dpif_upcall(handler226)|INFO|re
> > ceived packet on unassociated datapath port 0
> > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command
> > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system@ovs-system: failed
> to
> > add br-eth as port: No such device
> > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > datapath ID 2a51cf9f2841
> > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service
> controller "punix:/var/run/openvswitch/br-eth.mgmt"
> >
> > Signed-off-by: Kaige Fu 
> > ---
> >  lib/route-table.c | 15 +++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/lib/route-table.c b/lib/route-table.c index
> > 67fda31..5107056 100644
> > --- a/lib/route-table.c
> > +++ b/lib/route-table.c
> > @@ -33,6 +33,7 @@
> >  #include "ovs-router.h"
> >  #include "packets.h"
> >  #include "rtnetlink.h"
> > +#include "tnl-ports.h"
> >  #include "openvswitch/vlog.h"
> >
> >  /* Linux 2.6.36 added RTA_MARK, so define it just in case we're
> > building with @@ -79,6 +80,7 @@ static int route_table_reset(void);
> > static void route_table_handle_msg(const struct route_table_msg *);
> > static int route_table_parse(struct ofpbuf *, struct route_table_msg
> > *);  static void route_table_change(const struct route_table_msg *,
> > void *);
> > +static void route_table_link_del(const struct rtnetlink_change *,
> > +void *);
> >  static void route_map_clear(void);
> >
> >  static void name_table_init(void);
> > @@ -112,6 +114,8 @@ route_table_init(void)
> >  nln_notifier_create(nln, RTNLGRP_IPV6_ROUTE,
> >  (nln_notify_func *) route_table_change,
> > NULL);
> >
> > +rtnetlink_notifier_create(route_table_link_del, NULL);
> > +
> >  route_table_reset();
> >  name_table_init();
> >
> > @@ -297,6 +301,17 @@ route_table_change(const struct route_table_msg
> > *change OVS_UNUSED,  }
> >
> >  static void
> > +route_table_link_del(const struct rtnetlink_change *change,
> > + void *aux OVS_UNUSED) {
> > +if(change && change->nlmsg_type == RTM_DELLINK) {
> > +if(change->ifname) {
> > +tnl_port_map_delete_ipdev(change->ifname);
> > +}
> > +}
> > +}
> > +
> > +static void
> >  route_table_handle_msg(const struct route_table_msg *change)  {
> >  if (change->relevant && change->nlmsg_type == RTM_NEWROUTE) {
> > --
> > 1.8.3.1
> >
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 4/4] dp-packet: Use memcpy on dp_packet elements.

2017-08-14 Thread Darrell Ball
Could you show some throughput results for a particular tests ?

-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Friday, August 11, 2017 at 8:52 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v3 4/4] dp-packet: Use memcpy on dp_packet
elements.

memcpy replaces the several single copies inside
dp_packet_clone_with_headroom().

Signed-off-by: Antonio Fischetti 
---
I tested this change by comparing the CPU Time over a 60 sec analysis
with VTune.

In original ovs:
dp_packet_clone_with_headroom4.530s

+ this changes:
dp_packet_clone_with_headroom3.920s

Further details were reported in this reply for v1

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DJune_334536.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=6qGelsac4wBCA5cHYalnmiB0QySdkGCItzNJ_-twiJg=Wc0BH6ff_CG15qMnkGq8VDI5YmNby5Eo3JxRdZslOsI=
 
---
 lib/dp-packet.c | 18 +-
 lib/dp-packet.h |  6 +-
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 67aa406..f4dbcb7 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -157,8 +157,9 @@ dp_packet_clone(const struct dp_packet *buffer)
 return dp_packet_clone_with_headroom(buffer, 0);
 }
 
-/* Creates and returns a new dp_packet whose data are copied from 
'buffer'.   The
- * returned dp_packet will additionally have 'headroom' bytes of headroom. 
*/
+/* Creates and returns a new dp_packet whose data are copied from 'buffer'.
+ * The returned dp_packet will additionally have 'headroom' bytes of
+ * headroom. */
 struct dp_packet *
 dp_packet_clone_with_headroom(const struct dp_packet *buffer, size_t 
headroom)
 {
@@ -167,13 +168,12 @@ dp_packet_clone_with_headroom(const struct dp_packet 
*buffer, size_t headroom)
 new_buffer = dp_packet_clone_data_with_headroom(dp_packet_data(buffer),
  dp_packet_size(buffer),
  headroom);
-new_buffer->l2_pad_size = buffer->l2_pad_size;
-new_buffer->l2_5_ofs = buffer->l2_5_ofs;
-new_buffer->l3_ofs = buffer->l3_ofs;
-new_buffer->l4_ofs = buffer->l4_ofs;
-new_buffer->md = buffer->md;
-new_buffer->cutlen = buffer->cutlen;
-new_buffer->packet_type = buffer->packet_type;
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(_buffer->l2_pad_size, >l2_pad_size,
+sizeof(struct dp_packet) -
+offsetof(struct dp_packet, l2_pad_size));
+
 #ifdef DPDK_NETDEV
 new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
 #else
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 9dbb611..9cab7c7 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -40,7 +40,8 @@ enum OVS_PACKED_ENUM dp_packet_source {
 DPBUF_STACK,   /* Un-movable stack space or static buffer. 
*/
 DPBUF_STUB,/* Starts on stack, may expand into heap. */
 DPBUF_DPDK,/* buffer data is from DPDK allocated 
memory.
-* ref to dp_packet_init_dpdk() in 
dp-packet.c. */
+* ref to dp_packet_init_dpdk() in 
dp-packet.c.
+*/
 };
 
 #define DP_PACKET_CONTEXT_SIZE 64
@@ -61,6 +62,9 @@ struct dp_packet {
 bool rss_hash_valid;/* Is the 'rss_hash' valid? */
 #endif
 enum dp_packet_source source;  /* Source of memory allocated as 
'base'. */
+
+/* All the following elements of this struct are copied by a single 
call
+ * to memcpy in dp_packet_clone_with_headroom. */
 uint8_t l2_pad_size;   /* Detected l2 padding size.
 * Padding is non-pullable. */
 uint16_t l2_5_ofs; /* MPLS label stack offset, or 
UINT16_MAX */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=6qGelsac4wBCA5cHYalnmiB0QySdkGCItzNJ_-twiJg=LREI681tchx40jdHLb94j5YDPgqXZnbI9_Yv4QNnqyA=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 3/4] conntrack: pass current time to conntrack_execute.

2017-08-14 Thread Darrell Ball
I did not try it yet, but seems reasonable

-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Friday, August 11, 2017 at 8:52 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v3 3/4] conntrack: pass current time to   
conntrack_execute.

Current time is passed to conntrack_execute so it doesn't have
to recompute it again.

Signed-off-by: Antonio Fischetti 
Acked by: Sugesh Chandran 
---
In a firewall testbench set up with

 table=0, priority=1 actions=drop
 table=0, priority=10,arp actions=NORMAL
 table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
 table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
 table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
 table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1

I measured packet Rx rate in Mpps (regardless of packet loss).
Bidirectional test with 64B UDP packets.

2 PMDs, 3 Tx q.

I collected the following performance figures.

Orig OvS-DPDK means Commit ID:
  6b1babacc3ca0488e07596bf822fe356c9bab646

+---++
UDP | Orig OvS-DPDK +   | Previous case  |
connections | patches #1,2  | + this patch   |
  --+---++
 1  |   1.42, 1.42  |   1.82, 1.76   |
10  |   2.35, 2.35  |   2.35, 2.35   |
50  |   2.38, 2.42  |   2.41, 2.43   |
   100  |   2.49, 2.49  |   2.50, 2.52   |
   500  |   2.11, 2.10  |   2.21, 2.22   |
  1000  |   2.02, 2.00  |   2.09, 2.11   |
  5000  |   1.94, 1.87  |   1.93, 1.89   |
  --+---++
---
 lib/conntrack.c| 4 ++--
 lib/conntrack.h| 3 ++-
 lib/dpif-netdev.c  | 2 +-
 tests/test-conntrack.c | 8 +---
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 1c0e023..c61bcd6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1136,13 +1136,13 @@ conntrack_execute(struct conntrack *ct, struct 
dp_packet_batch *pkt_batch,
   const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper,
-  const struct nat_action_info_t *nat_action_info)
+  const struct nat_action_info_t *nat_action_info,
+  long long now)
 {
 
 struct dp_packet **pkts = pkt_batch->packets;
 size_t cnt = pkt_batch->count;
 struct conn_lookup_ctx ctx;
-long long now = time_msec();
 
 for (size_t i = 0; i < cnt; i++) {
 if (!conn_key_extract(ct, pkts[i], dl_type, , zone)) {
diff --git a/lib/conntrack.h b/lib/conntrack.h
index 272d2d5..fbeef1c 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -95,7 +95,8 @@ int conntrack_execute(struct conntrack *, struct 
dp_packet_batch *,
   uint16_t zone, const uint32_t *setmark,
   const struct ovs_key_ct_labels *setlabel,
   const char *helper,
-  const struct nat_action_info_t *nat_action_info);
+  const struct nat_action_info_t *nat_action_info,
+  long long now);
 
 struct conntrack_dump {
 struct conntrack *ct;
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0db6f83..6aeedf8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5471,7 +5471,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 
 conntrack_execute(>conntrack, packets_, aux->flow->dl_type, 
force,
   commit, zone, setmark, setlabel, helper,
-  nat_action_info_ref);
+  nat_action_info_ref, now);
 break;
 }
 
diff --git a/tests/test-conntrack.c b/tests/test-conntrack.c
index 6a77db8..b27d181 100644
--- a/tests/test-conntrack.c
+++ b/tests/test-conntrack.c
@@ -84,12 +84,13 @@ ct_thread_main(void *aux_)
 struct dp_packet_batch *pkt_batch;
 ovs_be16 dl_type;
 size_t i;
+long long now = time_msec();
 
 pkt_batch = prepare_packets(batch_size, change_conn, aux->tid, 
_type);
 ovs_barrier_block();
 for (i = 0; i < n_pkts; i += batch_size) {
 conntrack_execute(, pkt_batch, dl_type, false, true, 0, NULL, 
NULL,
-  NULL, NULL);

Re: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when EMC is disabled

2017-08-14 Thread Darrell Ball
I did not try it yet, but seems reasonable
If the hash is needed for something else, it will be read at that point. 

-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Friday, August 11, 2017 at 8:52 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v3 2/4] dpif-netdev: Avoid reading RSS hash when  
EMC is disabled

When EMC is disabled the reading of RSS hash is skipped.
Also, for packets that are not recirculated it retrieves the hash
value without considering the recirc id.

Signed-off-by: Antonio Fischetti 
---
This patch depends on previous patch in this series.

Port-to-Port Test
=
Software built with "-O2 -march=native -g".

I measured the Rx rate regardless of pkt loss by sending 1 UDP flow,
64B packets, at the line-rate.

2 PMDs with 3 Tx queues.

Flows setup:
  in_port=dpdk0 actions=output:dpdk1
  in_port=dpdk1 actions=output:dpdk0

Results
---
Values are for the Rx rate in Mpps, regardless of packet loss.

RSS column:
   Yes: RSS hash is provided by the NIC
   No: RSS is disabled and the 5-tuple hash must be
   computed in software.

EMC column:
   Yes: default probability insertion,
   No: EMC disabled.

Orig OvS-DPDK means Commit ID:
  6b1babacc3ca0488e07596bf822fe356c9bab646

+--+++
+-+-+Orig  | Orig + patch 1 | Orig + patch 1 |
| RSS | EMC |  ||  this patch|
+-+-+--+++
| Yes | Yes | 11.99, 11.41 | 12.20, 11.31   | 12.20, 11.31   |
| Yes |  No |  8.32,  8.42 |  8.35,  8.39   |  8.62, 8.62|
+-+-+--+++
|  No | Yes |  9.87, 11.15 |  9.79, 11.20   |  9.85, 11.09   |
|  No |  No |  7.82,  7.84 |  7.84,  7.93   |  8.40,  8.38   |
+-+-+--+++
---
 lib/dpif-netdev.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 8f6b96b..0db6f83 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4578,6 +4578,22 @@ dp_netdev_upcall(struct dp_netdev_pmd_thread *pmd, 
struct dp_packet *packet_,
 }
 
 static inline uint32_t
+dpif_netdev_packet_get_rss_hash_orig_pkt(struct dp_packet *packet,
+const struct miniflow *mf)
+{
+uint32_t hash;
+
+if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
+hash = dp_packet_get_rss_hash(packet);
+} else {
+hash = miniflow_hash_5tuple(mf, 0);
+dp_packet_set_rss_hash(packet, hash);
+}
+
+return hash;
+}
+
+static inline uint32_t
 dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
 const struct miniflow *mf)
 {
@@ -4715,7 +4731,6 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 miniflow_extract(packet, >mf);
 key->len = 0; /* Not computed yet. */
-key->hash = dpif_netdev_packet_get_rss_hash(packet, >mf);
 
 /*
  * EMC lookup is skipped when one or both of the following
@@ -4732,9 +4747,12 @@ emc_processing(struct dp_netdev_pmd_thread *pmd,
  */
 if (OVS_LIKELY(cur_min)) {
 if (!md_is_valid) {
+key->hash = 
dpif_netdev_packet_get_rss_hash_orig_pkt(packet,
+>mf);
 flow = emc_lookup(flow_cache, key);
 } else {
 /* Recirculated packet. */
+key->hash = dpif_netdev_packet_get_rss_hash(packet, 
>mf);
 if (flow_cache->n_entries & 
EMC_RECIRCT_NO_INSERT_THRESHOLD) {
 /* EMC occupancy is over the threshold.  We skip EMC
  * lookup for recirculated packets. */
-- 
2.4.11

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=Kp92W-efwcliK8l_PTI61aY9T1CPqdhzlAxSHGRMad8=NdrRKHn6b0aW6Gbj5wOH3_Rkon2dcVgoyPBzfzJs_E4=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for recirc packets

2017-08-14 Thread Darrell Ball


-Original Message-
From:  on behalf of 
"antonio.fische...@intel.com" 
Date: Friday, August 11, 2017 at 8:52 AM
To: "d...@openvswitch.org" 
Subject: [ovs-dev] [PATCH v3 1/4] dpif-netdev: Skip EMC lookup/insert for   
recirc packets

When OVS is configured as a firewall, with thousands of active
concurrent connections, the EMC gets quicly saturated and may
come under heavy thrashing for the reason that original and
recirculated packets keep overwriting the existing active EMC
entries due to its limited size (8k).


The recirculated packet could have been modified, in which case, maybe we
still want to do the emc lookup/insert ?


This thrashing causes the EMC to be less efficient than the dcpls
in terms of lookups and insertions.

This patch allows to use the EMC efficiently by allowing only
the 'original' packets to hit EMC. All recirculated packets are
sent to the classifier directly.
An empirical threshold EMC_RECIRCT_NO_INSERT_THRESHOLD - of 50% -
for EMC occupancy is set to trigger this logic. By doing so when
EMC utilization exceeds EMC_RECIRCT_NO_INSERT_THRESHOLD:
 - EMC Insertions are allowed just for original packets.
   EMC insertion and look up are skipped for recirculated packets.
 - Recirculated packets are sent to the classifier.

This patch is based on patch
"dpif-netdev: add EMC entry count and %full figure to pmd-stats-show" at:

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2017-2DJanuary_327570.html=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=NHY06RD-Bcweizxd86m6hcsLPKpe7a4WVSyh9aNZQlo=-PhWyltJ71UipVzd1D0H0I9k4uSTLdCJ_zanXxHd7fo=
 

CC: Jan Scheurich 
Signed-off-by: Antonio Fischetti 
Signed-off-by: Bhanuprakash Bodireddy 
Co-authored-by: Bhanuprakash Bodireddy 
---
Connection Tracker testbench set up with

 table=0, priority=1 actions=drop
 table=0, priority=10,arp actions=NORMAL
 table=0, priority=100,ct_state=-trk,ip actions=ct(table=1)
 table=1, ct_state=+new+trk,ip,in_port=1 actions=ct(commit),output:2
 table=1, ct_state=+est+trk,ip,in_port=1 actions=output:2
 table=1, ct_state=+new+trk,ip,in_port=2 actions=drop
 table=1, ct_state=+est+trk,ip,in_port=2 actions=output:1

2 PMDs, 3 Tx queues.

I measured packet Rx rate (regardless of packet loss). Bidirectional
test with 64B UDP packets.
Each row is a test with a different number of traffic streams. The traffic
generator is set so that each stream establishes one UDP connection.
Mpps columns reports the Rx rates on the 2 sides.

I set up the generator to loop on the dest IP addr on one side,
and loop instead on the source IP addr on the other side.

For example to generate 10 different flows, I was sending to phy port #1
UDP, IPsrc:10.10.10.10, IPdest: 20.20.20.[20-29], PortSrc: 63, PortDest: 63

Instead to phy port #2 (source and dest IPs are now swapped):
UDP, IPsrc: 20.20.20.[20-29], IPdest: 10.10.10.10, PortSrc: 63, PortDest: 63

I saw the following performance improvement.

Original OvS-DPDK means at Commit ID:
  6b1babacc3ca0488e07596bf822fe356c9bab646

  +--+---+
  |  Original OvS-DPDK   |   Original OvS-DPDK   |
  |  |+ this patch   |
 -++-++--+
  Traffic | Rx |   EMC   | Rx |   EMC|
  Streams |   [Mpps]   | entries |   [Mpps]   | entries  |
 -++-++--+
 100  | 2.43, 2.49 |   200   | 2.55, 2.57 |   201|
   1,000  | 2.01, 2.02 |  2007   | 2.12, 2.12 |  2006|
   2,000  | 1.93, 1.95 |  3868   | 1.98, 1.96 |  3884|
   3,000  | 1.87, 1.91 |  5086   | 1.97, 1.97 |  4757|
   4,000  | 1.83, 1.82 |  6173   | 1.94, 1.93 |  5280|
  10,000  | 1.67, 1.69 |  7826   | 1.82, 1.81 |  7090|
  30,000  | 1.57, 1.59 |  8192   | 1.66, 1.67 |  8192|
 -++-++--+

This test setup implies 1 recirculation on each received packet.
We didn't check this patch in a test scenario where more than 1
recirculation is occurring per packet.
---
 lib/dpif-netdev.c | 65 
+++
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index bea1c3f..8f6b96b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4663,6 +4663,9 @@ dp_netdev_queue_batches(struct dp_packet 

Re: [ovs-dev] OVS 2.9 Intel Roadmap

2017-08-14 Thread Darrell Ball
Any prioritization or other qos features missing ?


-Original Message-
From:  on behalf of "Stokes, Ian" 

Date: Friday, August 11, 2017 at 8:54 AM
To: "ovs-dev@openvswitch.org" 
Subject: [ovs-dev] OVS 2.9 Intel Roadmap

Below are the features that Intel are planning to submit for the OVS 2.9 
release.

I've also created a google spreadsheet so that people can contribute their 
own planned features. This can be found at:


https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_spreadsheets_d_1FilGq46vQePFKoehADWWsDvDCZSNsMU7PrpPr3EM3lU_edit-23gid-3D0=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=nnRkfhFw7IvUrJlAF8STuyl-RP9HOYU1qV2SkuscjuY=zNu3CkwzL63RcmiOcqXe9xBWKKs-wmNaqOOMeuekThY=
 

If people can complete this doc and send an email to the dev mailing list 
then we can formalize the features and commit them to a doc in the OVS repo.

It would be good if others are also willing to share their plans so that we 
can build up a complete picture of what's planned for the OVS 2.9 release and 
make sure there's no duplication.

Keepalive: Keepalive feature is aimed at achieving Fastpath Service 
Assurance  in OVS-DPDK deployments. It adds support for monitoring the packet 
processing threads by dispatching heartbeats at regular intervals and the 
status is updated periodically in OVSDB. In case of heartbeat misses the 
failure is detected and reported to higher level fault management 
systems/frameworks. The status can be monitored from OpenStack Ceilometer 
service.

vHost PMD: The vHost PMD brings vHost User ports under control of DPDK's 
librte_ether library and in doing so DPDK's librte_vhost library will no longer 
be directly referenced in OVS. Potential benefits include shared code among 
port types in netdev-dpdk and smoother upgrades between DPDK versions in OVS.

Multi-segment Mbuf: Support for jumbo frames was added to netdev-dpdk in 
v2.6.0. In that implementation, a jumbo frame is stored within a single-segment 
mbuf. That mode will remain the default, but an option will now be added to 
allow a jumbo frame to be stored within a multi-segment mbuf. This reduces the 
requirement for large, contiguous blocks of memory, and may be useful in 
deployments with limited memory.

Conn Track: Analyze and improve Connection Tracker performance:
The Connection Tracker is a feature to manage stateful connections and 
implement security Firewalls.
This allows a better protection against attacks and helps in load 
balancing. The counterpart is a significant impact on the overall performance. 
This work is aimed at analyzing possible bottlenecks - also considering the 
latest protocol implementations - to improve the Connection Tracker performance.

IPSEC: This feature looks to introduce IPsec into OVS with DPDK. IPsec 
would function in transport mode and would be used in conjunction with existing 
encapsulation methods (initially VxLAN) to create a new interface type 
'vxlanipsec'. The DPDK cryptodev framework will be used to  handle 
cipher/digest operations as part of the encap/decap actions in accordance with 
a security association. The cryptodev devices supported would be limited to 
virtual crypto devices such as the  AESN-MB vdev. As such cipher and digest 
algorithms supported would be limited to the capabilities of the vdev. In terms 
of Security Association generation for IPsec policies, the feature will allow a 
user to specify security associations via command line options for the 
interface.

Zero copy: Support for enabling DPDK's 'dequeue zero copy' feature on vHost 
User ports which removes the costly memcpy operation from guest to host when 
enabled. Detailed information in the DPDK documentation: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__dpdk.org_doc_guides_prog-5Fguide_vhost-5Flib.html-23vhost-2Dapi-2Doverview=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=nnRkfhFw7IvUrJlAF8STuyl-RP9HOYU1qV2SkuscjuY=mhU8CQHE1fWvN8qYotMHZnCnMSMjsDbKBUblD2M1Gjk=
 .

DPDK 17.05.2 and 17.11 support: DPDK 17.05.2 is the latest stable release 
of DPDK, and includes numerous bug-fixes and stability updates.
DPDK 17.11 will be the next DPDK Long Term Support (LTS) release (TBD). It 
will also include numerous new features, such as Generic Segmentation Offload 
(GSO) and vHost PMD.

Upcall performance: The first packet in every flow that is handled by OVS 
is an upcall to the exception path. This is a performance bottleneck for some 
use cases. Analysis and improvements to the performance of this code path will 
be investigated.

Service Assurance : Virtual switching reporting of flow telemetry via IPFix.

Extended NIC Stats: Expose low level driver statistics registers to the 
user through the Extended NIC statistics API in DPDK for physical interfaces.

NAT 

Re: [ovs-dev] [PATCH] rhel: Add support for "systemctl reload openvswitch"

2017-08-14 Thread Markos Chandras
On 08/08/2017 11:01 PM, Timothy M. Redaelli wrote:
> 
> The script is actually "specialized" to how Fedora/RHEL starts
> openvswitch since, I think, only Fedora/RHEL have ovsdb-server and
> ovs-vswitchd as splitted systemd unit files.

For the record, SUSE uses the same approach

> [...]
> 
> Sounds like a good suggestion.
> 
> I'll do another patchset with a commit that adds the --bundle option +
> get_highest_ofp_version in ovs-save and another commit that changes
> ovs-reload script to use it.
> 
> get_highest_ofp_version is used to find if we can use --bundle and to
> make ovs-ofctl work when you have a bridge with only OpenFlow15+
> protocols enabled.
> 
> Any other suggestions?

The rest looks good to me. Thank you

-- 
markos

SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev