Re: [ovs-dev] [PATCH] NSH: Fix NSH-related length macros that cause stack overflow

2018-10-26 Thread Yifeng Sun
Thanks Ben for code review. I will leave explanation here for later
reference.

When md2 context is a string of size 248, the headers's total length would
be 256
bytes. Because there is only 6 bits in ver_flags_ttl_len for length value,
the value of ver_flags_ttl_len is a wrong value of 0.

In format_odp_push_nsh_action(), the first line of code is:
size_t mdlen = nsh_hdr_len(nsh_hdr) - NSH_BASE_HDR_LEN;
nsh_hdr_len(nsh_hdr) returns 0. As a result, mdlen will be a very large
value.
Later in this function, when mdlen is used as the upper limit to access
md2_ctx,
stack overflow will happen.


On Fri, Oct 26, 2018 at 3:05 PM Ben Pfaff  wrote:

> On Fri, Oct 26, 2018 at 02:55:55PM -0700, Ben Pfaff wrote:
> > On Thu, Oct 25, 2018 at 02:41:50PM -0700, Yifeng Sun wrote:
> > > In the filed of ver_flags_ttl_len of struct nshhdr, there are only 6
> > > bits that are used to indicate header's total length in 4-byte words.
> > > Therefore, the max value for total is 252 (63x4), instead of 256 used
> > > in present code base. This patch fixes it.
> > >
> > > Reported-at:
> https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10855
> > > Signed-off-by: Yifeng Sun 
> >
> > Thanks for the patch and the bug fix.
> >
> > Would you mind adding a few words to the commit message that explains
> > how this can lead to stack overflow?
>
> Oops, I accidentally applied this anyway.  Never mind on the commit
> message update.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/8] ofp-table: Always format the table number in table features.

2018-10-26 Thread Ben Pfaff
On Tue, Oct 23, 2018 at 02:59:08PM -0700, Justin Pettit wrote:
> 
> > On Aug 30, 2018, at 1:00 PM, Ben Pfaff  wrote:
> > 
> > Table features should indicate the table number as well as the table
> > name.  Before this, the first line for each table looked like this:
> >   table myname ("myname"):
> > but it's more useful if it's:
> >   table 123 ("myname"):
> > 
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Justin Pettit 

Thanks for the reviews.  I applied patches 1 to 4 to master.  I'll wait
for reviews of the remaining patches.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofp-msgs: Added ONF_ and NXT_REQUESTFORWARD for OF1.0-1.3

2018-10-26 Thread Ben Pfaff
On Fri, Oct 26, 2018 at 03:06:28PM -0700, Zak Whittington wrote:
> Backported OFPT14_REQUESTFORWARD to OF1.0-1.3.
> OF 1.0-1.2 use an NXT Nicira extension while OF 1.3
> uses an ONF extension (ONF version is specified in a
> previously published ONF spec sheet).
> 
> Includes ofp-print tests for multiple inner message
> types, and multiple OF versions including the NXT and ONF.
> Also includes more end-to-end ofproto tests for both
> NXT OF1.0 and also ONF OF1.3.
> 
> VMware-BZ: 2136594
> Signed-off-by: Zak Whittington 

Clang said:

../lib/ofp-monitor.c:817:24: error: implicit conversion from enumeration 
type 'enum ofpraw' to different enumeration type 'enum ofptype' 
[-Werror,-Wenum-conversion]
../lib/ofp-monitor.c:819:24: error: implicit conversion from enumeration 
type 'enum ofpraw' to different enumeration type 'enum ofptype' 
[-Werror,-Wenum-conversion]
../lib/ofp-monitor.c:821:24: error: implicit conversion from enumeration 
type 'enum ofpraw' to different enumeration type 'enum ofptype' 
[-Werror,-Wenum-conversion]
../lib/ofp-monitor.c:824:45: error: implicit conversion from enumeration 
type 'enum ofptype' to different enumeration type 'enum ofpraw' 
[-Werror,-Wenum-conversion]

so I changed the type of raw_msg_type from enum ofptype to enum ofpraw,
and applied this to master.

Thanks again!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ofp-msgs: Added ONF_ and NXT_REQUESTFORWARD for OF1.0-1.3

2018-10-26 Thread Zak Whittington
Backported OFPT14_REQUESTFORWARD to OF1.0-1.3.
OF 1.0-1.2 use an NXT Nicira extension while OF 1.3
uses an ONF extension (ONF version is specified in a
previously published ONF spec sheet).

Includes ofp-print tests for multiple inner message
types, and multiple OF versions including the NXT and ONF.
Also includes more end-to-end ofproto tests for both
NXT OF1.0 and also ONF OF1.3.

VMware-BZ: 2136594
Signed-off-by: Zak Whittington 
---
 include/openvswitch/ofp-monitor.h |   1 +
 include/openvswitch/ofp-msgs.h|   8 +-
 lib/ofp-monitor.c |  22 +++--
 ofproto/connmgr.c |   7 +-
 tests/ofp-print.at|  76 +++
 tests/ofproto.at  | 199 ++
 6 files changed, 305 insertions(+), 8 deletions(-)

diff --git a/include/openvswitch/ofp-monitor.h 
b/include/openvswitch/ofp-monitor.h
index 1bfcf92..5951260 100644
--- a/include/openvswitch/ofp-monitor.h
+++ b/include/openvswitch/ofp-monitor.h
@@ -116,6 +116,7 @@ struct ofpbuf *ofputil_encode_flow_monitor_cancel(uint32_t 
id);
 
 struct ofputil_requestforward {
 ovs_be32 xid;
+/* Also used for OF 1.0-1.3 when using Nicira Extension: */
 enum ofp14_requestforward_reason reason;
 union {
 /* reason == OFPRFR_METER_MOD. */
diff --git a/include/openvswitch/ofp-msgs.h b/include/openvswitch/ofp-msgs.h
index 8a32a3d..cacba41 100644
--- a/include/openvswitch/ofp-msgs.h
+++ b/include/openvswitch/ofp-msgs.h
@@ -274,6 +274,10 @@ enum ofpraw {
 /* OFPT 1.4+ (31): struct ofp14_table_status, uint8_t[8][]. */
 OFPRAW_OFPT14_TABLE_STATUS,
 
+/* NXT 1.0-1.2 (132): struct ofp14_requestforward, uint8_t[8][]. */
+OFPRAW_NXT_REQUESTFORWARD,
+/* ONFT 1.3 (2350): struct ofp14_requestforward, uint8_t[8][]. */
+OFPRAW_ONFT13_REQUESTFORWARD,
 /* OFPT 1.4+ (32): struct ofp14_requestforward, uint8_t[8][]. */
 OFPRAW_OFPT14_REQUESTFORWARD,
 
@@ -645,7 +649,9 @@ enum ofptype {
* OFPRAW_OFPT14_ROLE_STATUS. */
 
 /* Request forwarding by the switch. */
-OFPTYPE_REQUESTFORWARD,   /* OFPRAW_OFPT14_REQUESTFORWARD. */
+OFPTYPE_REQUESTFORWARD,   /* OFPRAW_NXT_REQUESTFORWARD.
+   * OFPRAW_ONFT13_REQUESTFORWARD.
+   * OFPRAW_OFPT14_REQUESTFORWARD. */
 
 /* Asynchronous messages. */
 OFPTYPE_TABLE_STATUS,  /* OFPRAW_OFPT14_TABLE_STATUS. */
diff --git a/lib/ofp-monitor.c b/lib/ofp-monitor.c
index d1853d9..9e0983c 100644
--- a/lib/ofp-monitor.c
+++ b/lib/ofp-monitor.c
@@ -785,13 +785,13 @@ ofputil_flow_update_format(struct ds *s,
 }
 }
 
-/* Encodes 'rf' according to 'protocol', and returns the encoded message.
- * 'protocol' must be for OpenFlow 1.4 or later. */
+/* Encodes 'rf' according to 'protocol', and returns the encoded message. */
 struct ofpbuf *
 ofputil_encode_requestforward(const struct ofputil_requestforward *rf,
   enum ofputil_protocol protocol)
 {
 enum ofp_version ofp_version = ofputil_protocol_to_ofp_version(protocol);
+enum ofptype raw_msg_type;
 struct ofpbuf *inner;
 
 switch (rf->reason) {
@@ -813,9 +813,16 @@ ofputil_encode_requestforward(const struct 
ofputil_requestforward *rf,
 inner_oh->xid = rf->xid;
 inner_oh->length = htons(inner->size);
 
-struct ofpbuf *outer = ofpraw_alloc_xid(OFPRAW_OFPT14_REQUESTFORWARD,
-ofp_version, htonl(0),
-inner->size);
+if (ofp_version < OFP13_VERSION) {
+raw_msg_type = OFPRAW_NXT_REQUESTFORWARD;
+} else if (ofp_version == OFP13_VERSION) {
+raw_msg_type = OFPRAW_ONFT13_REQUESTFORWARD;
+} else {
+raw_msg_type = OFPRAW_OFPT14_REQUESTFORWARD;
+}
+
+struct ofpbuf *outer = ofpraw_alloc_xid(raw_msg_type, ofp_version,
+htonl(0), inner->size);
 ofpbuf_put(outer, inner->data, inner->size);
 ofpbuf_delete(inner);
 
@@ -836,7 +843,10 @@ ofputil_decode_requestforward(const struct ofp_header 
*outer,
 struct ofpbuf b = ofpbuf_const_initializer(outer, ntohs(outer->length));
 
 /* Skip past outer message. */
-ovs_assert(ofpraw_pull_assert() == OFPRAW_OFPT14_REQUESTFORWARD);
+enum ofpraw raw_msg_type = ofpraw_pull_assert();
+ovs_assert(raw_msg_type == OFPRAW_OFPT14_REQUESTFORWARD ||
+   raw_msg_type == OFPRAW_ONFT13_REQUESTFORWARD ||
+   raw_msg_type == OFPRAW_NXT_REQUESTFORWARD);
 
 /* Validate inner message. */
 if (b.size < sizeof(struct ofp_header)) {
diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 9431200..db9f6ba 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1691,8 +1691,13 @@ connmgr_send_requestforward(struct connmgr *mgr, const 
struct ofconn *source,
 struct ofconn *ofconn;
 
 LIST_FOR_EACH (ofconn, node, >all_conns) {
+/* METER_MOD 

Re: [ovs-dev] [PATCH] ovs-ofctl: Update documentation for MPLS actions.

2018-10-26 Thread Ben Pfaff
On Tue, Oct 23, 2018 at 02:00:20PM -0700, Justin Pettit wrote:
> 
> > On Oct 23, 2018, at 9:26 AM, Ben Pfaff  wrote:
> > 
> > Reported-by: lidejun 
> > Signed-off-by: Ben Pfaff 
> 
> Acked-by: Justin Pettit 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] NSH: Fix NSH-related length macros that cause stack overflow

2018-10-26 Thread Ben Pfaff
On Fri, Oct 26, 2018 at 02:55:55PM -0700, Ben Pfaff wrote:
> On Thu, Oct 25, 2018 at 02:41:50PM -0700, Yifeng Sun wrote:
> > In the filed of ver_flags_ttl_len of struct nshhdr, there are only 6
> > bits that are used to indicate header's total length in 4-byte words.
> > Therefore, the max value for total is 252 (63x4), instead of 256 used
> > in present code base. This patch fixes it.
> > 
> > Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10855
> > Signed-off-by: Yifeng Sun 
> 
> Thanks for the patch and the bug fix.
> 
> Would you mind adding a few words to the commit message that explains
> how this can lead to stack overflow?

Oops, I accidentally applied this anyway.  Never mind on the commit
message update.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] NSH: Fix NSH-related length macros that cause stack overflow

2018-10-26 Thread Ben Pfaff
On Thu, Oct 25, 2018 at 02:41:50PM -0700, Yifeng Sun wrote:
> In the filed of ver_flags_ttl_len of struct nshhdr, there are only 6
> bits that are used to indicate header's total length in 4-byte words.
> Therefore, the max value for total is 252 (63x4), instead of 256 used
> in present code base. This patch fixes it.
> 
> Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10855
> Signed-off-by: Yifeng Sun 

Thanks for the patch and the bug fix.

Would you mind adding a few words to the commit message that explains
how this can lead to stack overflow?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofctl_parse_target: Fix a bug that uses incomplete ofputil_flow_mod data

2018-10-26 Thread Ben Pfaff
On Thu, Oct 25, 2018 at 04:17:23PM -0700, Yifeng Sun wrote:
> When parse_ofp_flow_mod_str returns error, `fm` is incomplete and pointers
> in it may be null, e.g. fm.match.flow. In this case, passing it to
> ofctl_parse_flows__ may cause pointer errors because ofctl_parse_flows__
> expects a valid input of type struct ofputil_flow_mod.
> 
> Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=0
> Signed-off-by: Yifeng Sun 

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] OVN: introduce mac_prefix support to IPAM

2018-10-26 Thread Mark Michelson

Looks good, Lorenzo,

Just to reaaffirm my previous ack:

Acked-by: Mark Michelson 

On 10/26/2018 12:20 PM, Lorenzo Bianconi wrote:

Add the possibility to specify a given mac address prefix for
dynamically generated mac address. Mac address prefix can be
specified in nbdb NB_Global table, options:mac_prefix=
This patch fix a possible issue of L2 address duplication if
multiple OVN deployments share a single broadcast domain

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
Changes since v1:
- use a global definition for mac_prefix
---
  ovn/northd/ovn-northd.c | 37 ++---
  ovn/ovn-nb.xml  |  5 +
  tests/ovn.at| 17 +
  3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 439651f80..816c72311 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -68,6 +68,7 @@ static const char *unixctl_path;
  /* MAC address management (macam) table of "struct eth_addr"s, that holds the
   * MAC addresses allocated by the OVN ipam module. */
  static struct hmap macam = HMAP_INITIALIZER();
+static struct eth_addr mac_prefix;
  
  #define MAX_OVN_TAGS 4096

  
@@ -922,10 +923,17 @@ ipam_insert_mac(struct eth_addr *ea, bool check)
  }
  
  uint64_t mac64 = eth_addr_to_uint64(*ea);

+uint64_t prefix;
+
+if (!eth_addr_is_zero(mac_prefix)) {
+prefix = eth_addr_to_uint64(mac_prefix);
+} else {
+prefix = MAC_ADDR_PREFIX;
+}
  /* If the new MAC was not assigned by this address management system or
   * check is true and the new MAC is a duplicate, do not insert it into the
   * macam hmap. */
-if (((mac64 ^ MAC_ADDR_PREFIX) >> 24)
+if (((mac64 ^ prefix) >> 24)
  || (check && ipam_is_duplicate_mac(ea, mac64, true))) {
  return;
  }
@@ -1036,7 +1044,11 @@ ipam_get_unused_mac(void)
  for (i = 0; i < MAC_ADDR_SPACE - 1; i++) {
  /* The tentative MAC's suffix will be in the interval (1, 0xfe). 
*/
  mac_addr_suffix = ((last_mac + i) % (MAC_ADDR_SPACE - 1)) + 1;
-mac64 = MAC_ADDR_PREFIX | mac_addr_suffix;
+if (!eth_addr_is_zero(mac_prefix)) {
+mac64 =  eth_addr_to_uint64(mac_prefix) | mac_addr_suffix;
+} else {
+mac64 = MAC_ADDR_PREFIX | mac_addr_suffix;
+}
  eth_addr_from_uint64(mac64, );
  if (!ipam_is_duplicate_mac(, mac64, false)) {
  last_mac = mac_addr_suffix;
@@ -1107,7 +1119,15 @@ dynamic_mac_changed(const char *lsp_addresses,
 }
  
 uint64_t mac64 = eth_addr_to_uint64(update->current_addresses.ea);

-   if ((mac64 ^ MAC_ADDR_PREFIX) >> 24) {
+   uint64_t prefix;
+
+   if (!eth_addr_is_zero(mac_prefix)) {
+   prefix = eth_addr_to_uint64(mac_prefix);
+   } else {
+   prefix = MAC_ADDR_PREFIX;
+   }
+
+   if ((mac64 ^ prefix) >> 24) {
 return DYNAMIC;
 } else {
 return NONE;
@@ -7141,6 +7161,17 @@ ovnnb_db_run(struct northd_context *ctx,
  sbrec_sb_global_set_options(sb, >options);
  sb_loop->next_cfg = nb->nb_cfg;
  
+const char *mac_addr_prefix = smap_get(>options, "mac_prefix");

+if (mac_addr_prefix) {
+struct eth_addr addr;
+
+memset(, 0, sizeof addr);
+if (ovs_scan(mac_addr_prefix, "%"SCNx8":%"SCNx8":%"SCNx8,
+ [0], [1], [2])) {
+mac_prefix = addr;
+}
+}
+
  cleanup_macam();
  }
  
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml

index c0739fe57..f309b3b86 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -102,6 +102,11 @@
tunnel interfaces.
  

+
+  
+Configure a given OUI to be used as prefix when L2 address is
+dynamically assigned, e.g. 00:11:22
+  
  
  
  

diff --git a/tests/ovn.at b/tests/ovn.at
index 8825beca3..e512f94aa 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5616,6 +5616,23 @@ AT_CHECK([ovn-nbctl get Logical-Switch-Port p41 
dynamic_addresses], [0],
   ["f0:00:00:00:10:2b 192.168.1.3"
  ])
  
+# define a mac address prefix

+ovn-nbctl ls-add sw6
+ovn-nbctl --wait=hv set NB_Global . options:mac_prefix="00:11:22:33:44:55"
+ovn-nbctl --wait=sb set Logical-Switch sw6 other_config:subnet=192.168.100.0/24
+for n in $(seq 1 3); do
+ovn-nbctl --wait=sb lsp-add sw6 "p5$n" -- lsp-set-addresses "p5$n" dynamic
+done
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p51 dynamic_addresses], [0],
+["00:11:22:00:00:4d 192.168.100.2"
+])
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p52 dynamic_addresses], [0],
+["00:11:22:00:00:4e 192.168.100.3"
+])
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p53 dynamic_addresses], [0],
+["00:11:22:00:00:4f 192.168.100.4"
+])
+
  as ovn-sb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVN: configure L2 address according to the used IP address

2018-10-26 Thread 0-day Robot
Bleep bloop.  Greetings Lorenzo Bianconi, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
fatal: sha1 information is lacking or useless (ovn/northd/ovn-northd.c).
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 OVN: configure L2 address according to the used IP address
The copy of the patch that failed is found in:
   
/var/lib/jenkins/jobs/upstream_build_from_pw/workspace/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@bytheb.org

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v7 0/6] IPsec support for tunneling

2018-10-26 Thread Ben Pfaff
Hi Ansis.  Do you plan to further review this series or should I take a
look at it in hopes of merging it?

Thanks,

Ben.

On Wed, Sep 19, 2018 at 05:15:52PM -0400, Qiuyu Xiao wrote:
> This patch series reintroduce IPsec support for OVS tunneling and enable
> OVN to
> use IPsec tunnels. GRE, VXLAN, GENEVE, and STT IPsec tunnels are
> supported.
> StrongSwan and LibreSwan IKE daemons are supported.
> 
> Changes from v1 to v2
> -
> 1. Merge the ovs-monitor-ipsec code to a single patch. Add LibreSwan IKE
> daemon support.
> 2. Add ovs-monitor-ipsec to flake8 check.
> 3. Use openssl to extract CN from certificate so that users don't need
> to
> specify the CN information in the configuration interface.
> 4. Improve documentations as suggested.
> 
> Changes from v2 to v3
> -
> 1. Add scripts and rules to create ovs-ipsec RPM package.
> 2. Add Documentation/tutorials/ipsec.rst which gives a step-by-step OVS
> IPsec
> tutorial. Modify Documentation/howto/ipsec.rst which gives a detailed
> description on OVS IPsec configuration modes.
> 3. Modify ovs-pki to generate x.509 version 3 certificate when do
> self-sign.
> 4. IPsec tunnel interface needs 'local_ip' information. Modify
> ovn-controller
> to add 'local_ip' when IPsec is enabled.
> 5. Add a section on ovn/ovn-architecture.7.xml to introduce ovn IPsec.
> 
> Changes from v3 to v4
> -
> 1. Split the datapath patch to three patches (geneve, vxlan, stt).
> 2. Add tutorial for OVN RBAC and OVN IPsec.
> 
> Changes from v4 to v5
> -
> 1. Fix coding style issues in ovs-monitor-ipsec.
> 2. Improve IPsec and OVN-IPsec tutorials as suggested.
> 3. Change the description of setting skb_mark in documentation to
> reflect the
> real situation.
> 
> Changes from v5 to v6
> -
> 1. Use wildcard IP address to match localhost IP in LibreSwan. Remove
> the 'local_ip' requirement when setting IPsec tunnel interface.
> 2. ovs-monitor-ipsec daemon accepts command line option to choose IKE
> daemon, either LibreSwan or StrongSwan. The init script chooses which
> IKE daemon to use. Currently, Debian init script chooses StrongSwan.
> Fedora init script chooses LibreSwan.
> 3. Check illegal name before removing a file in
> '_import_local_certs_and_key()'.
> 4. GRE IPsec tunnel was not activated properly when using LibreSwan. This
> version fixes it.
> 5. The plaintext policy syntax was wrong when using LibreSwan. This version
> corrects it.
> 6. Add comments and explanations about the 'remote_name'
> check in '_is_valid_tunnel_conf()'.
> 7. Replace 'ike_daemon_start()' with 'ike_daemon_restart()' to start IKE
> daemon.
> 
> Changes from v6 to v7
> -
> 1. Use os.path.abspath to generate the path of the p12 file to make sure
> the path is under '/tmp/'.
> 2. When ovs-monitor-ipsec daemon restarts, check whether NSS database
> has old certificates and private keys set by previous run. If so, delete
> those old states.
> 
> *** BLURB HERE ***
> 
> Qiuyu Xiao (6):
>   datapath: add transport ports in route lookup for geneve
>   ipsec: reintroduce IPsec support for tunneling
>   debian and rhel: Create IPsec package.
>   Documentation: IPsec tunnel tutorial and documentation.
>   OVN: native support for tunnel encryption
>   Documentation: OVN RBAC and IPsec tutorial
> 
>  Documentation/automake.mk  |4 +
>  Documentation/howto/index.rst  |1 +
>  Documentation/howto/ipsec.rst  |  194 
>  Documentation/index.rst|5 +-
>  Documentation/tutorials/index.rst  |3 +
>  Documentation/tutorials/ipsec.rst  |  347 ++
>  Documentation/tutorials/ovn-ipsec.rst  |  146 +++
>  Documentation/tutorials/ovn-rbac.rst   |  134 +++
>  Makefile.am|1 +
>  datapath/linux/compat/geneve.c |   29 +-
>  debian/automake.mk |3 +
>  debian/control |   21 +
>  debian/openvswitch-ipsec.dirs  |1 +
>  debian/openvswitch-ipsec.init  |  181 +++
>  debian/openvswitch-ipsec.install   |1 +
>  ipsec/automake.mk  |   10 +
>  ipsec/ovs-monitor-ipsec| 1223 
> 
>  ovn/controller/encaps.c|   14 +-
>  ovn/controller/encaps.h|6 +-
>  ovn/controller/ovn-controller.c|3 +-
>  ovn/northd/ovn-northd.c|8 +-
>  ovn/ovn-architecture.7.xml |   39 +
>  ovn/ovn-nb.ovsschema   |7 +-
>  ovn/ovn-nb.xml |6 +
>  ovn/ovn-sb.ovsschema   |7 +-
>  

[ovs-dev] [PATCH 2/2] OVN: configure L2 address according to the used IP address

2018-10-26 Thread Lorenzo Bianconi
Configure L2 dynamic address according to used IPv4 address.
This patch allows to define a deterministic relationship between
L2 and L3 addresses when dynamic IPAM is used.
This patch allows to fix a possible L2/L3 address mismatch than can
occur when pods are created and destroyed at high rate [1] since if
there is no relation between MAC and IP addresses ARP cache can be
poisoned with a wrong correspondence

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1626217

Signed-off-by: Lorenzo Bianconi 
---
 ovn/northd/ovn-northd.c |  45 
 tests/ovn.at| 110 
 2 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 33fc94f72..f47738b3b 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1033,25 +1033,22 @@ ipam_add_port_addresses(struct ovn_datapath *od, struct 
ovn_port *op)
 }
 
 static uint64_t
-ipam_get_unused_mac(void)
+ipam_get_unused_mac(ovs_be32 ip)
 {
-/* Stores the suffix of the most recently ipam-allocated MAC address. */
-static uint32_t last_mac;
-
-uint64_t mac64;
+uint32_t mac_addr_suffix, i, base_addr = ntohl(ip) & MAC_ADDR_SPACE;
 struct eth_addr mac;
-uint32_t mac_addr_suffix, i;
+uint64_t mac64;
+
 for (i = 0; i < MAC_ADDR_SPACE - 1; i++) {
 /* The tentative MAC's suffix will be in the interval (1, 0xfe). */
-mac_addr_suffix = ((last_mac + i) % (MAC_ADDR_SPACE - 1)) + 1;
+mac_addr_suffix = ((base_addr + i) % (MAC_ADDR_SPACE - 1)) + 1;
 if (!eth_addr_is_zero(mac_prefix)) {
 mac64 =  eth_addr_to_uint64(mac_prefix) | mac_addr_suffix;
 } else {
 mac64 = MAC_ADDR_PREFIX | mac_addr_suffix;
 }
 eth_addr_from_uint64(mac64, );
-if (!ipam_is_duplicate_mac(, mac64, false)) {
-last_mac = mac_addr_suffix;
+if (!ipam_is_duplicate_mac(, mac64, true)) {
 break;
 }
 }
@@ -1287,21 +1284,6 @@ set_dynamic_updates(const char *addrspec,
 static void
 update_dynamic_addresses(struct dynamic_address_update *update)
 {
-struct eth_addr mac;
-switch (update->mac) {
-case NONE:
-mac = update->current_addresses.ea;
-break;
-case REMOVE:
-OVS_NOT_REACHED();
-case STATIC:
-mac = update->static_mac;
-break;
-case DYNAMIC:
-eth_addr_from_uint64(ipam_get_unused_mac(), );
-break;
-}
-
 ovs_be32 ip4 = 0;
 switch (update->ipv4) {
 case NONE:
@@ -1317,6 +1299,21 @@ update_dynamic_addresses(struct dynamic_address_update 
*update)
 ip4 = htonl(ipam_get_unused_ip(update->od));
 }
 
+struct eth_addr mac;
+switch (update->mac) {
+case NONE:
+mac = update->current_addresses.ea;
+break;
+case REMOVE:
+OVS_NOT_REACHED();
+case STATIC:
+mac = update->static_mac;
+break;
+case DYNAMIC:
+eth_addr_from_uint64(ipam_get_unused_mac(ip4), );
+break;
+}
+
 struct in6_addr ip6 = in6addr_any;
 switch (update->ipv6) {
 case NONE:
diff --git a/tests/ovn.at b/tests/ovn.at
index e512f94aa..ab32faa6b 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5312,7 +5312,7 @@ ovn-nbctl ls-add sw0
 ovn-nbctl lsp-add sw0 p0 -- lsp-set-addresses p0 dynamic
 ovn-nbctl --wait=sb add Logical-Switch sw0 other_config subnet=192.168.1.0/24
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p0 dynamic_addresses], [0],
-["0a:00:00:00:00:01 192.168.1.2"
+["0a:00:00:a8:01:03 192.168.1.2"
 ])
 
 # Add 9 more ports to sw0, addresses should all be unique.
@@ -5320,31 +5320,31 @@ for n in `seq 1 9`; do
 ovn-nbctl --wait=sb lsp-add sw0 "p$n" -- lsp-set-addresses "p$n" dynamic
 done
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p1 dynamic_addresses], [0],
-["0a:00:00:00:00:02 192.168.1.3"
+["0a:00:00:a8:01:04 192.168.1.3"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p2 dynamic_addresses], [0],
-["0a:00:00:00:00:03 192.168.1.4"
+["0a:00:00:a8:01:05 192.168.1.4"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p3 dynamic_addresses], [0],
-["0a:00:00:00:00:04 192.168.1.5"
+["0a:00:00:a8:01:06 192.168.1.5"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p4 dynamic_addresses], [0],
-["0a:00:00:00:00:05 192.168.1.6"
+["0a:00:00:a8:01:07 192.168.1.6"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p5 dynamic_addresses], [0],
-["0a:00:00:00:00:06 192.168.1.7"
+["0a:00:00:a8:01:08 192.168.1.7"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p6 dynamic_addresses], [0],
-["0a:00:00:00:00:07 192.168.1.8"
+["0a:00:00:a8:01:09 192.168.1.8"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p7 dynamic_addresses], [0],
-["0a:00:00:00:00:08 192.168.1.9"
+["0a:00:00:a8:01:0a 192.168.1.9"
 ])
 AT_CHECK([ovn-nbctl get Logical-Switch-Port p8 dynamic_addresses], [0],
-["0a:00:00:00:00:09 192.168.1.10"
+

[ovs-dev] [PATCH 1/2] OVN: assign new addresses at the end of build_ipam routine

2018-10-26 Thread Lorenzo Bianconi
Visit all ovn datapaths before adding new dynamic addresses to the
system in order to avoid possible L2 address duplication when
the same MAC address is configured on different ovn logical switches.
Current implementation can miss the duplicated address since macam
is cleared at each ovn run and there is no guarantee on visit order
of ovn datapath hash table

Signed-off-by: Lorenzo Bianconi 
---
 ovn/northd/ovn-northd.c | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 816c72311..33fc94f72 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -1093,6 +1093,7 @@ enum dynamic_update_type {
 struct dynamic_address_update {
 struct ovs_list node;   /* In build_ipam()'s list of updates. */
 
+struct ovn_datapath *od;
 struct ovn_port *op;
 
 struct lport_addresses current_addresses;
@@ -1284,8 +1285,7 @@ set_dynamic_updates(const char *addrspec,
 }
 
 static void
-update_dynamic_addresses(struct ovn_datapath *od,
- struct dynamic_address_update *update)
+update_dynamic_addresses(struct dynamic_address_update *update)
 {
 struct eth_addr mac;
 switch (update->mac) {
@@ -1314,7 +1314,7 @@ update_dynamic_addresses(struct ovn_datapath *od,
 case STATIC:
 OVS_NOT_REACHED();
 case DYNAMIC:
-ip4 = htonl(ipam_get_unused_ip(od));
+ip4 = htonl(ipam_get_unused_ip(update->od));
 }
 
 struct in6_addr ip6 = in6addr_any;
@@ -1329,14 +1329,14 @@ update_dynamic_addresses(struct ovn_datapath *od,
 case STATIC:
 OVS_NOT_REACHED();
 case DYNAMIC:
-in6_generate_eui64(mac, >ipam_info.ipv6_prefix, );
+in6_generate_eui64(mac, >od->ipam_info.ipv6_prefix, );
 break;
 }
 
 struct ds new_addr = DS_EMPTY_INITIALIZER;
 ds_put_format(_addr, ETH_ADDR_FMT, ETH_ADDR_ARGS(mac));
 if (ip4) {
-ipam_insert_ip(od, ntohl(ip4));
+ipam_insert_ip(update->od, ntohl(ip4));
 ds_put_format(_addr, " "IP_FMT, IP_ARGS(ip4));
 }
 if (!IN6_ARE_ADDR_EQUAL(, _any)) {
@@ -1362,13 +1362,14 @@ build_ipam(struct hmap *datapaths, struct hmap *ports)
 /* If the switch's other_config:subnet is set, allocate new addresses for
  * ports that have the "dynamic" keyword in their addresses column. */
 struct ovn_datapath *od;
+struct ovs_list updates;
+
+ovs_list_init();
 HMAP_FOR_EACH (od, key_node, datapaths) {
 if (!od->nbs) {
 continue;
 }
 
-struct ovs_list updates;
-ovs_list_init();
 for (size_t i = 0; i < od->nbs->n_ports; i++) {
 const struct nbrec_logical_switch_port *nbsp = od->nbs->ports[i];
 
@@ -1405,6 +1406,7 @@ build_ipam(struct hmap *datapaths, struct hmap *ports)
 struct dynamic_address_update *update
 = xzalloc(sizeof *update);
 update->op = op;
+update->od = od;
 if (nbsp->dynamic_addresses) {
 bool any_changed;
 extract_lsp_addresses(nbsp->dynamic_addresses,
@@ -1431,15 +1433,16 @@ build_ipam(struct hmap *datapaths, struct hmap *ports)
 }
 }
 
-/* After retaining all unchanged dynamic addresses, now assign
- * new ones.
- */
-struct dynamic_address_update *update;
-LIST_FOR_EACH_POP (update, node, ) {
-update_dynamic_addresses(od, update);
-destroy_lport_addresses(>current_addresses);
-free(update);
-}
+}
+
+/* After retaining all unchanged dynamic addresses, now assign
+ * new ones.
+ */
+struct dynamic_address_update *update;
+LIST_FOR_EACH_POP (update, node, ) {
+update_dynamic_addresses(update);
+destroy_lport_addresses(>current_addresses);
+free(update);
 }
 }
 
-- 
2.17.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/2] configure mac address according to IP address

2018-10-26 Thread Lorenzo Bianconi
When dynamic IPAM is used configure L2 address according to the used
IPv4 address.
Moreover visit all ovn datapaths before adding new dynamic addresses to the
system in order to avoid possible L2 address duplication

This patchset is based on top of 'OVN: introduce mac_prefix support to IPAM'
https://mail.openvswitch.org/pipermail/ovs-dev/2018-October/353327.html

Lorenzo Bianconi (2):
  OVN: assign new addresses at the end of build_ipam routine
  OVN: configure L2 address according to the used IP address

 ovn/northd/ovn-northd.c |  80 ++---
 tests/ovn.at| 110 
 2 files changed, 95 insertions(+), 95 deletions(-)

-- 
2.17.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] OVN: introduce mac_prefix support to IPAM

2018-10-26 Thread Lorenzo Bianconi
Add the possibility to specify a given mac address prefix for
dynamically generated mac address. Mac address prefix can be
specified in nbdb NB_Global table, options:mac_prefix=
This patch fix a possible issue of L2 address duplication if
multiple OVN deployments share a single broadcast domain

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
Changes since v1:
- use a global definition for mac_prefix
---
 ovn/northd/ovn-northd.c | 37 ++---
 ovn/ovn-nb.xml  |  5 +
 tests/ovn.at| 17 +
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 439651f80..816c72311 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -68,6 +68,7 @@ static const char *unixctl_path;
 /* MAC address management (macam) table of "struct eth_addr"s, that holds the
  * MAC addresses allocated by the OVN ipam module. */
 static struct hmap macam = HMAP_INITIALIZER();
+static struct eth_addr mac_prefix;
 
 #define MAX_OVN_TAGS 4096
 
@@ -922,10 +923,17 @@ ipam_insert_mac(struct eth_addr *ea, bool check)
 }
 
 uint64_t mac64 = eth_addr_to_uint64(*ea);
+uint64_t prefix;
+
+if (!eth_addr_is_zero(mac_prefix)) {
+prefix = eth_addr_to_uint64(mac_prefix);
+} else {
+prefix = MAC_ADDR_PREFIX;
+}
 /* If the new MAC was not assigned by this address management system or
  * check is true and the new MAC is a duplicate, do not insert it into the
  * macam hmap. */
-if (((mac64 ^ MAC_ADDR_PREFIX) >> 24)
+if (((mac64 ^ prefix) >> 24)
 || (check && ipam_is_duplicate_mac(ea, mac64, true))) {
 return;
 }
@@ -1036,7 +1044,11 @@ ipam_get_unused_mac(void)
 for (i = 0; i < MAC_ADDR_SPACE - 1; i++) {
 /* The tentative MAC's suffix will be in the interval (1, 0xfe). */
 mac_addr_suffix = ((last_mac + i) % (MAC_ADDR_SPACE - 1)) + 1;
-mac64 = MAC_ADDR_PREFIX | mac_addr_suffix;
+if (!eth_addr_is_zero(mac_prefix)) {
+mac64 =  eth_addr_to_uint64(mac_prefix) | mac_addr_suffix;
+} else {
+mac64 = MAC_ADDR_PREFIX | mac_addr_suffix;
+}
 eth_addr_from_uint64(mac64, );
 if (!ipam_is_duplicate_mac(, mac64, false)) {
 last_mac = mac_addr_suffix;
@@ -1107,7 +1119,15 @@ dynamic_mac_changed(const char *lsp_addresses,
}
 
uint64_t mac64 = eth_addr_to_uint64(update->current_addresses.ea);
-   if ((mac64 ^ MAC_ADDR_PREFIX) >> 24) {
+   uint64_t prefix;
+
+   if (!eth_addr_is_zero(mac_prefix)) {
+   prefix = eth_addr_to_uint64(mac_prefix);
+   } else {
+   prefix = MAC_ADDR_PREFIX;
+   }
+
+   if ((mac64 ^ prefix) >> 24) {
return DYNAMIC;
} else {
return NONE;
@@ -7141,6 +7161,17 @@ ovnnb_db_run(struct northd_context *ctx,
 sbrec_sb_global_set_options(sb, >options);
 sb_loop->next_cfg = nb->nb_cfg;
 
+const char *mac_addr_prefix = smap_get(>options, "mac_prefix");
+if (mac_addr_prefix) {
+struct eth_addr addr;
+
+memset(, 0, sizeof addr);
+if (ovs_scan(mac_addr_prefix, "%"SCNx8":%"SCNx8":%"SCNx8,
+ [0], [1], [2])) {
+mac_prefix = addr;
+}
+}
+
 cleanup_macam();
 }
 
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index c0739fe57..f309b3b86 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -102,6 +102,11 @@
   tunnel interfaces.
 
   
+
+  
+Configure a given OUI to be used as prefix when L2 address is
+dynamically assigned, e.g. 00:11:22
+  
 
 
 
diff --git a/tests/ovn.at b/tests/ovn.at
index 8825beca3..e512f94aa 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -5616,6 +5616,23 @@ AT_CHECK([ovn-nbctl get Logical-Switch-Port p41 
dynamic_addresses], [0],
  ["f0:00:00:00:10:2b 192.168.1.3"
 ])
 
+# define a mac address prefix
+ovn-nbctl ls-add sw6
+ovn-nbctl --wait=hv set NB_Global . options:mac_prefix="00:11:22:33:44:55"
+ovn-nbctl --wait=sb set Logical-Switch sw6 other_config:subnet=192.168.100.0/24
+for n in $(seq 1 3); do
+ovn-nbctl --wait=sb lsp-add sw6 "p5$n" -- lsp-set-addresses "p5$n" dynamic
+done
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p51 dynamic_addresses], [0],
+["00:11:22:00:00:4d 192.168.100.2"
+])
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p52 dynamic_addresses], [0],
+["00:11:22:00:00:4e 192.168.100.3"
+])
+AT_CHECK([ovn-nbctl get Logical-Switch-Port p53 dynamic_addresses], [0],
+["00:11:22:00:00:4f 192.168.100.4"
+])
+
 as ovn-sb
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
-- 
2.17.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] OVN: introduce mac_prefix support to IPAM

2018-10-26 Thread Lorenzo Bianconi
>
> Hi Lorenzo,
>
> I think for the patch, I'm willing to ack it, so:
>
> Acked-by: Mark Michelson 
>
> However,  I have some minor suggestions below that might be worth making
> a second version of the patch for.

Thx for the review Mark, I will address your comments and send a v2

Regards,
Lorenzo
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] OVN: introduce mac_prefix support to IPAM

2018-10-26 Thread Mark Michelson

Hi Lorenzo,

I think for the patch, I'm willing to ack it, so:

Acked-by: Mark Michelson 

However,  I have some minor suggestions below that might be worth making 
a second version of the patch for.


On 10/23/2018 06:01 AM, Lorenzo Bianconi wrote:

Add the possibility to specify a given mac address prefix for
dynamically generated mac address. Mac address prefix can be
specified in nbdb NB_Global table, options:mac_prefix=
This patch fix a possible issue of L2 address duplication if
multiple OVN deployments share a single broadcast domain

Signed-off-by: Lorenzo Bianconi 
---
  ovn/northd/ovn-northd.c | 76 -
  ovn/ovn-nb.xml  |  5 +++
  tests/ovn.at| 17 +
  3 files changed, 81 insertions(+), 17 deletions(-)

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 439651f80..3e8a4a276 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -404,6 +404,9 @@ struct ipam_info {
  unsigned long *allocated_ipv4s; /* A bitmap of allocated IPv4s */
  bool ipv6_prefix_set;
  struct in6_addr ipv6_prefix;
+bool mac_prefix_set;
+struct eth_addr mac_prefix;
+
  };
  
  /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or

@@ -534,7 +537,8 @@ lrouter_is_enabled(const struct nbrec_logical_router 
*lrouter)
  }
  
  static void

-init_ipam_info_for_datapath(struct ovn_datapath *od)
+init_ipam_info_for_datapath(struct ovn_datapath *od,
+struct northd_context *ctx)
  {
  if (!od->nbs) {
  return;
@@ -543,11 +547,27 @@ init_ipam_info_for_datapath(struct ovn_datapath *od)
  const char *subnet_str = smap_get(>nbs->other_config, "subnet");
  const char *ipv6_prefix = smap_get(>nbs->other_config, "ipv6_prefix");
  
+const struct nbrec_nb_global *nb = nbrec_nb_global_first(ctx->ovnnb_idl);

+const char *mac_prefix = smap_get(>options, "mac_prefix");
+
  if (ipv6_prefix) {
  od->ipam_info.ipv6_prefix_set = ipv6_parse(
  ipv6_prefix, >ipam_info.ipv6_prefix);
  }
  
+if (mac_prefix) {

+struct eth_addr addr;
+
+memset(, 0, sizeof addr);
+if (ovs_scan(mac_prefix, "%"SCNx8":%"SCNx8":%"SCNx8,
+ [0], [1], [2])) {
+od->ipam_info.mac_prefix_set = true;
+od->ipam_info.mac_prefix = addr > +} else {
+od->ipam_info.mac_prefix_set = false;
+}
+}


This operation will be exactly the same for every datapath since the 
mac_prefix is a global setting. It seems wasteful to do the same thing 
on every datapath. Why not parse the configured mac_prefix once and 
store it as a global in ovn-northd?



+
  if (!subnet_str) {
  return;
  }
@@ -703,7 +723,7 @@ join_datapaths(struct northd_context *ctx, struct hmap 
*datapaths,
  ovs_list_push_back(nb_only, >list);
  }
  
-init_ipam_info_for_datapath(od);

+init_ipam_info_for_datapath(od, ctx);
  }
  
  const struct nbrec_logical_router *nbr;

@@ -915,17 +935,24 @@ ipam_is_duplicate_mac(struct eth_addr *ea, uint64_t 
mac64, bool warn)
  }
  
  static void

-ipam_insert_mac(struct eth_addr *ea, bool check)
+ipam_insert_mac(struct ovn_datapath *od, struct eth_addr *ea, bool check)


I believe od can be made const.


  {
  if (!ea) {
  return;
  }
  
  uint64_t mac64 = eth_addr_to_uint64(*ea);

+uint64_t prefix;
+
+if (od->ipam_info.mac_prefix_set) {
+prefix = eth_addr_to_uint64(od->ipam_info.mac_prefix);
+} else {
+prefix = MAC_ADDR_PREFIX;
+}
  /* If the new MAC was not assigned by this address management system or
   * check is true and the new MAC is a duplicate, do not insert it into the
   * macam hmap. */
-if (((mac64 ^ MAC_ADDR_PREFIX) >> 24)
+if (((mac64 ^ prefix) >> 24)
  || (check && ipam_is_duplicate_mac(ea, mac64, true))) {
  return;
  }
@@ -970,7 +997,7 @@ ipam_insert_lsp_addresses(struct ovn_datapath *od, struct 
ovn_port *op,
  VLOG_WARN_RL(, "Extract addresses failed.");
  return;
  }
-ipam_insert_mac(, true);
+ipam_insert_mac(od, , true);
  
  /* IP is only added to IPAM if the switch's subnet option

   * is set, whereas MAC is always added to MACAM. */
@@ -1007,7 +1034,7 @@ ipam_add_port_addresses(struct ovn_datapath *od, struct 
ovn_port *op)
  VLOG_WARN_RL(, "Extract addresses failed.");
  return;
  }
-ipam_insert_mac(_networks.ea, true);
+ipam_insert_mac(od, _networks.ea, true);
  
  if (!op->peer || !op->peer->nbsp || !op->peer->od || !op->peer->od->nbs

  || !smap_get(>peer->od->nbs->other_config, "subnet")) {
@@ -1025,7 +1052,7 @@ ipam_add_port_addresses(struct ovn_datapath *od, struct 
ovn_port *op)
  }
  
  static uint64_t

-ipam_get_unused_mac(void)
+ipam_get_unused_mac(struct ovn_datapath *od)


I believe od can be 

[ovs-dev] d...@openvswitch.org Swift Copy

2018-10-26 Thread INTERNATIONAL FREIGHT CO., LTD
dear...@openvswitch.org ___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 2/2] netdev-dpdk: Add link speed to get_status().

2018-10-26 Thread Ilya Maximets
> Report the link speed of the device in netdev_dpdk_get_status()
> function.
> 
> Link speed is already reported as part of the netdev_get_features()
> function. However only link speeds defined in the OpenFlow specs are
> supported so speeds such as 25 Gbps etc. are not shown. The link
> speed for the device is available in Mbps in rte_eth_link.
> This commit converts the link speed for a given dpdk device to an
> easy to read string and reports it in get_status().

Hi Ian,
Do we really need to convert the speed from Mbps to Gbps? I'm not sure
about that.
Anyway, you don't need to snprintf the constant strings. You may
just return them like netdev_feature_to_name() does.
Another option to reduce the code size is to use few 'if's and calculate
the printed value mathematically.

> 
> Suggested-by: Flavio Leitner 
> Signed-off-by: Ian Stokes 
> ---
>  lib/netdev-dpdk.c | 62 
> +++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 35eb30f8d..b3a2430a0 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -164,6 +164,8 @@ typedef uint16_t dpdk_port_t;
>  #define VHOST_ENQ_RETRY_NUM 8
>  #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
>  
> +#define MAX_LINK_SPEED_STR 12
> +
>  static const struct rte_eth_conf port_conf = {
>  .rxmode = {
>  .mq_mode = ETH_MQ_RX_RSS,
> @@ -3022,11 +3024,62 @@ netdev_dpdk_vhost_user_get_status(const struct netdev 
> *netdev,
>  return 0;
>  }
>  
> +/*
> + * Convert a given uint32_t link speed defined in DPDK to a string
> + * equivalent.
> + */
> +static void
> +netdev_dpdk_link_speed_to_str__(uint32_t link_speed, char *speed_str)
> +{
> +switch (link_speed) {
> +case ETH_SPEED_NUM_10M:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "10 Mbps");
> +break;
> +case ETH_SPEED_NUM_100M:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "100 Mbps");
> +break;
> +case ETH_SPEED_NUM_1G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "1 Gbps");
> +break;
> +case ETH_SPEED_NUM_2_5G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "2.5 Gbps");
> +break;
> +case ETH_SPEED_NUM_5G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "5 Gbps");
> +break;
> +case ETH_SPEED_NUM_10G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "10 Gbps");
> +break;
> +case ETH_SPEED_NUM_20G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "20 Gbps");
> +break;
> +case ETH_SPEED_NUM_25G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "25 Gbps");
> +break;
> +case ETH_SPEED_NUM_40G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "40 Gbps");
> +break;
> +case ETH_SPEED_NUM_50G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "50 Gbps");
> +break;
> +case ETH_SPEED_NUM_56G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "56 Gbps");
> +break;
> +case ETH_SPEED_NUM_100G:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "100 Gbps");
> +break;
> +default:
> +snprintf(speed_str, MAX_LINK_SPEED_STR,"%s", "Not Defined");
> +}
> +}
> +
>  static int
>  netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  struct rte_eth_dev_info dev_info;
> +struct rte_eth_link link;
> +char link_speed_str [MAX_LINK_SPEED_STR];

Extra space not needed before the '['.

>  
>  if (!rte_eth_dev_is_valid_port(dev->port_id)) {
>  return ENODEV;
> @@ -3034,6 +3087,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, 
> struct smap *args)
>  
>  ovs_mutex_lock(>mutex);
>  rte_eth_dev_info_get(dev->port_id, _info);
> +link = dev->link;
>  ovs_mutex_unlock(>mutex);
>  
>  smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
> @@ -3065,6 +3119,14 @@ netdev_dpdk_get_status(const struct netdev *netdev, 
> struct smap *args)
>  dev_info.pci_dev->id.device_id);
>  }
>  
> +/* Not all link speeds are defined in the OpenFlow specs e.g. 25 Gbps.
> + * In that case the speed will not be reported as part of the usual
> + * call to get_features(). Get the link speed of the device and add it
> + * to the device status in an easy to read string format.
> + */
> +netdev_dpdk_link_speed_to_str__(link.link_speed, link_speed_str);
> +smap_add_format(args, "link_speed", "%s", link_speed_str);
> +
>  return 0;
>  }
>  
> -- 
> 2.13.6

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 1/2] netdev-dpdk: Fix netdev_dpdk_get_features().

2018-10-26 Thread Ilya Maximets
> This commit fixes netdev_dpdk_get_features() by initializing a bitmap
> that represents current features to zero and accounting for non defined
> link speed values in the OpenFlow spec.
> 
> The current approach for retrieving netdev dpdk features uses a
> pointer allocated in the stack without being initialized. As such there
> is no guarantee that the bitmap will be accurate. Fix this by declaring
> and initializing local variable 'feature' to be used when building the
> bitmap, with its value then assigned to the pointer. Also account for
> link speeds not defined in the OpenFlow spec by defaulting to
> NETDEV_F_OTHER for undefined link speeds.
> 
> Fixes: 8a9562d21a40 ("dpif-netdev: Add DPDK netdev.")
> Signed-off-by: Ian Stokes 
> ---
> Flavio, this patch is based on suggestions from you
> 
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-September/351809.html
> 
> I'd like to add
> 
> Co-authored-by: Flavio Leitner 
> Signed-off-by: Flavio Leitner 
> 
> if you agree to sign off on this.
> ---
>  lib/netdev-dpdk.c | 65 
> ++-
>  1 file changed, 40 insertions(+), 25 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f91aa27cd..35eb30f8d 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2707,43 +2707,58 @@ netdev_dpdk_get_features(const struct netdev *netdev,
>  {
>  struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>  struct rte_eth_link link;
> +uint32_t feature = 0;
>  
>  ovs_mutex_lock(>mutex);
>  link = dev->link;
>  ovs_mutex_unlock(>mutex);
>  
> -if (link.link_duplex == ETH_LINK_HALF_DUPLEX) {
> -if (link.link_speed == ETH_SPEED_NUM_10M) {
> -*current = NETDEV_F_10MB_HD;
> -}
> -if (link.link_speed == ETH_SPEED_NUM_100M) {
> -*current = NETDEV_F_100MB_HD;
> -}
> -if (link.link_speed == ETH_SPEED_NUM_1G) {
> -*current = NETDEV_F_1GB_HD;
> -}
> -} else if (link.link_duplex == ETH_LINK_FULL_DUPLEX) {
> -if (link.link_speed == ETH_SPEED_NUM_10M) {
> -*current = NETDEV_F_10MB_FD;
> -}
> -if (link.link_speed == ETH_SPEED_NUM_100M) {
> -*current = NETDEV_F_100MB_FD;
> -}
> -if (link.link_speed == ETH_SPEED_NUM_1G) {
> -*current = NETDEV_F_1GB_FD;
> -}
> -if (link.link_speed == ETH_SPEED_NUM_10G) {
> -*current = NETDEV_F_10GB_FD;
> +if (link.link_duplex == ETH_LINK_FULL_DUPLEX) {
> +switch (link.link_speed) {
> +/* OpenFlow defined values: see enum ofp_port_features */

This comment should be not only for full duplex case.
Also, I'm not sure about referring to enum ofp_port_features, because it
not used here.

> +case ETH_SPEED_NUM_10M:
> +feature |= NETDEV_F_10MB_FD;
> +break;
> +case ETH_SPEED_NUM_100M:
> +feature |= NETDEV_F_100MB_FD;
> +break;
> +case ETH_SPEED_NUM_1G:
> +feature |= NETDEV_F_1GB_FD;
> +break;
> +case ETH_SPEED_NUM_10G:
> +feature |= NETDEV_F_10GB_FD;
> +break;
> +case ETH_SPEED_NUM_40G:
> +feature |= NETDEV_F_40GB_FD;
> +break;
> +case ETH_SPEED_NUM_100G:
> +feature |= NETDEV_F_100GB_FD;
> +break;
> +default:
> +feature |= NETDEV_F_OTHER;
>  }
> -if (link.link_speed == ETH_SPEED_NUM_40G) {
> -*current = NETDEV_F_40GB_FD;
> +}
> +else if (link.link_duplex == ETH_LINK_HALF_DUPLEX) {

Could you place 'else if' on the same line with '}' ?

> +switch (link.link_speed) {
> +case ETH_SPEED_NUM_10M:
> +feature |= NETDEV_F_10MB_HD;
> +break;
> +case ETH_SPEED_NUM_100M:
> +feature |= NETDEV_F_100MB_HD;
> +break;
> +case ETH_SPEED_NUM_1G:
> +feature |= NETDEV_F_1GB_HD;
> +break;
> +default:
> +feature |= NETDEV_F_OTHER;
>  }
>  }
>  
>  if (link.link_autoneg) {
> -*current |= NETDEV_F_AUTONEG;
> +feature |= NETDEV_F_AUTONEG;
>  }
>  
> +*current = feature;
>  *advertised = *supported = *peer = 0;
>  
>  return 0;
> -- 
> 2.13.6

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] RFC for support of PMD Auto load balancing

2018-10-26 Thread Kevin Traynor
Hi Nitin,

Thanks for your work on this and sharing the RFC. Initial comments below.

On 10/11/2018 08:59 PM, Nitin Katiyar wrote:
> Port rx queues that have not been statically assigned to PMDs are currently
> assigned based on periodically sampled load measurements.
> The assignment is performed at specific instances – port addition, port
> deletion, upon reassignment request via CLI etc.
> 
> Over time it can cause uneven load among the PMDs due to change in traffic
> pattern and thus resulting in lower overall throughout.
> 
> This patch enables the support of auto load balancing of PMDs based
> on measured load of RX queues. Each PMD measures the processing load for
> each of its associated queues every 10 seconds. If the aggregated PMD load
> exceeds a configured threshold for 6 consecutive intervals and if there are
> receive packet drops at the NIC the PMD considers itself to be overloaded.
> 
> If any PMD considers itself to be overloaded, a dry-run of the PMD
> assignment algorithm is performed by OVS main thread. The dry-run
> does NOT change the existing queue to PMD assignments.
> 
> If the resultant mapping of dry-run indicates an improved distribution
> of the load then the actual reassignment will be performed. The automatic
> rebalancing will be disabled by default and has to be enabled via
> configuration option. Load thresholds, improvement factor etc are also
> configurable.
> 
> Following example commands can be used to set the auto-lb params:
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb="true"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-thresh="80"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-min-improvement="5"
> ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-drop-check="true"
> 

As you mentioned in follow up, this will never be perfect, so there
needs to be a way that the user can limit it happening even if the
criteria above is met. Something like allowing the user to set a max
number of rebalances for some time period. e.g. pmd-auto-lb-max-num and
pmd-auto-lb-time.

> Co-authored-by: Rohith Basavaraja 
> 
> Signed-off-by: Nitin Katiyar 
> Signed-off-by: Rohith Basavaraja 
> ---
>  lib/dpif-netdev.c | 589 
> +++---
>  1 file changed, 561 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e322f55..28593cc 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -80,6 +80,26 @@
>  
>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
> +/* Auto Load Balancing Defaults */
> +#define ACCEPT_IMPROVE_DEFAULT   (25)
> +#define PMD_LOAD_THRE_DEFAULT(99)
> +#define PMD_AUTO_LB_DISABLE  false
> +#define SKIP_DROP_CHECK_DEFAULT  false
> +
> +//TODO: Should we make it configurable??
> +#define PMD_MIN_NUM_DROPS(1)
> +#define PMD_MIN_NUM_QFILLS   (1)

It seems like a very small default. This would indicate that one dropped
pkt or one vhost q full is considered enough to do a dry run.

> +#define PMD_REBALANCE_POLL_TIMER_INTERVAL 6
> +
> +extern uint32_t log_q_thr;
> +
> +static bool pmd_auto_lb = PMD_AUTO_LB_DISABLE;
> +static bool auto_lb_skip_drop_check = SKIP_DROP_CHECK_DEFAULT;
> +static float auto_lb_pmd_load_ther = PMD_LOAD_THRE_DEFAULT;
> +static unsigned int auto_lb_accept_improve = ACCEPT_IMPROVE_DEFAULT;
> +static long long int pmd_rebalance_poll_timer = 0;
> +

I think these can be in 'struct dp_netdev' like the other config items
instead of globals

> +
>  #define FLOW_DUMP_MAX_BATCH 50
>  /* Use per thread recirc_depth to prevent recirculation loop. */
>  #define MAX_RECIRC_DEPTH 6
> @@ -393,6 +413,8 @@ enum rxq_cycles_counter_type {
> interval. */
>  RXQ_CYCLES_PROC_HIST,   /* Total cycles of all intervals that are 
> used
> during rxq to pmd assignment. */
> +RXQ_CYCLES_IDLE_CURR,   /* Cycles spent in idling. */
> +RXQ_CYCLES_IDLE_HIST,   /* Total cycles of all idle intervals. */

I'm not sure if it's really needed to measure this, or you can just use
'pmd->intrvl_cycles - sum(rxq intrvl's on that pmd)' like in
pmd_info_show_rxq(). It would be worth to try with that and see if
there's any noticable difference.

>  RXQ_N_CYCLES
>  };
>  
> @@ -429,6 +451,14 @@ static struct ovsthread_once offload_thread_once
>  
>  #define XPS_TIMEOUT 50LL/* In microseconds. */
>  
> +typedef struct {
> +unsigned long long prev_drops;
> +} q_drops;

Doesn't seem like the struct is needed here

> +typedef struct {
> +unsigned int num_vhost_qfill;
> +unsigned int prev_num_vhost_qfill;
> +} vhost_qfill;
> +
>  /* Contained by struct dp_netdev_port's 'rxqs' member.  */
>  struct dp_netdev_rxq {
>  struct dp_netdev_port *port;
> @@ -439,6 +469,10 @@ struct dp_netdev_rxq {
>particular core. */
>  unsigned intrvl_idx;   /* Write index for 'cycles_intrvl'. */
>  

Re: [ovs-dev] [PATCH v8 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev

2018-10-26 Thread Eelco Chaudron


On 26 Oct 2018, at 11:53, Simon Horman wrote:

> On Fri, 26 Oct 2018 at 08:58, Eelco Chaudron  wrote:
>


 I have a general comment, don't know where to put it, so I put it
 here.
 Some hardware might have multiple tables. If one type of table is
 full
 the ENOSPC might be returned, but it does not mean all type of flows
 can
 no longer be offloaded. This might be a situation to think about.
>>>
>>> Ok, thanks for bringing it up. Currently from OvS daemon's perspective
>>> a
>>> request to add/delete a flow is issued on a netdev and the failure
>>> indicates
>>> that the particular netdev is out of resources. If we need to handle
>>> the
>>> condition where HW has different tables, we need to further extend
>>> this
>>> design and the tc interfaces to propagate this fine grained
>>> information.
>>
>> Would be good if other hardware vendors can comment here?
>>
>
> There was a discussion in another forum involving at least Mellanox,
> Broadcom and Netronome.
> From a Netronome point of view this scheme is satisfactory and my
> recollection is that
> was the agreement of those involved in the discussion.

Thanks for the clarification…
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v8 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev

2018-10-26 Thread Simon Horman
On Fri, 26 Oct 2018 at 08:58, Eelco Chaudron  wrote:

>
>
> On 25 Oct 2018, at 16:00, Sriharsha Basavapatna wrote:
>
> > Hi Eelco,
> >
> > Thanks for your comments, please see my response below.
> > On Fri, Oct 19, 2018 at 7:52 PM Eelco Chaudron 
> > wrote:
> >>
> >> On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:
> >>
> >>> This is the first patch in the patch-set to support dynamic
> >>> rebalancing
> >>> of offloaded flows.
> >>>
> >>> The patch detects OOR condition on a netdev port when ENOSPC error
> >>> is
> >>> returned by TC-Flower while adding a flow rule. A new structure is
> >>> added
> >>> to the netdev called "netdev_hw_info", to store OOR related
> >>> information
> >>> required to perform dynamic offload-rebalancing.
> >>>
> >>> Signed-off-by: Sriharsha Basavapatna
> >>> 
> >>> Co-authored-by: Venkat Duvvuru 
> >>> Signed-off-by: Venkat Duvvuru 
> >>> Reviewed-by: Sathya Perla 
> >>> Reviewed-by: Simon Horman 
> >>> Reviewed-by: Ben Pfaff 
> >>> ---
> >>>  lib/dpif-netlink.c| 18 +-
> >>>  lib/flow.c| 25 +
> >>>  lib/flow.h|  1 +
> >>>  lib/netdev-provider.h | 11 +++
> >>>  lib/netdev.c  | 34 ++
> >>>  lib/netdev.h  |  3 +++
> >>>  6 files changed, 91 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> >>> index e6d5a6ec5..b9ce9cbe2 100644
> >>> --- a/lib/dpif-netlink.c
> >>> +++ b/lib/dpif-netlink.c
> >>> @@ -2178,7 +2178,23 @@ parse_flow_put(struct dpif_netlink *dpif,
> >>> struct dpif_flow_put *put)
> >>>
> >>>  VLOG_DBG("added flow");
> >>>  } else if (err != EEXIST) {
> >>> -VLOG_ERR_RL(, "failed to offload flow: %s",
> >>> ovs_strerror(err));
> >>> +struct netdev *oor_netdev = NULL;
> >>> +if (err == ENOSPC &&
> >>> netdev_is_offload_rebalance_policy_enabled()) {
> >>> +/*
> >>> + * We need to set OOR on the input netdev (i.e, 'dev')
> >>> for the
> >>> + * flow. But if the flow has a tunnel attribute (i.e,
> >>> decap action,
> >>> + * with a virtual device like a VxLAN interface as its
> >>> in-port),
> >>> + * then lookup and set OOR on the underlying tunnel
> >>> (real) netdev.
> >>> + */
> >>> +oor_netdev =
> >>> flow_get_tunnel_netdev();
> >>> +if (!oor_netdev) {
> >>> +/* Not a 'tunnel' flow */
> >>> +oor_netdev = dev;
> >>> +}
> >>> +netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);
> >>
> >> Why not just oor_netdev->hw_info.oor = true, see also below.
> >
> > The original code was directly accessing netdev members. It was
> > changed
> > based on a review comment to avoid direct access and add an interface.
> >
> >>
> >> I have a general comment, don't know where to put it, so I put it
> >> here.
> >> Some hardware might have multiple tables. If one type of table is
> >> full
> >> the ENOSPC might be returned, but it does not mean all type of flows
> >> can
> >> no longer be offloaded. This might be a situation to think about.
> >
> > Ok, thanks for bringing it up. Currently from OvS daemon's perspective
> > a
> > request to add/delete a flow is issued on a netdev and the failure
> > indicates
> > that the particular netdev is out of resources. If we need to handle
> > the
> > condition where HW has different tables, we need to further extend
> > this
> > design and the tc interfaces to propagate this fine grained
> > information.
>
> Would be good if other hardware vendors can comment here?
>

There was a discussion in another forum involving at least Mellanox,
Broadcom and Netronome.
>From a Netronome point of view this scheme is satisfactory and my
recollection is that
was the agreement of those involved in the discussion.


>
> >>
> >>> +}
> >>> +VLOG_ERR_RL(, "failed to offload flow: %s: %s",
> >>> ovs_strerror(err),
> >>> +(oor_netdev ? oor_netdev->name : dev->name));
> >>>  }
> >>>
> >>>  out:
> >>> diff --git a/lib/flow.c b/lib/flow.c
> >>> index 77ed3d9df..a39807908 100644
> >>> --- a/lib/flow.c
> >>> +++ b/lib/flow.c
> >>> @@ -19,6 +19,7 @@
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> +#include 
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>> @@ -41,6 +42,8 @@
> >>>  #include "unaligned.h"
> >>>  #include "util.h"
> >>>  #include "openvswitch/nsh.h"
> >>> +#include "ovs-router.h"
> >>> +#include "lib/netdev-provider.h"
> >>>
> >>>  COVERAGE_DEFINE(flow_extract);
> >>>  COVERAGE_DEFINE(miniflow_malloc);
> >>> @@ -3403,3 +3406,25 @@ flow_limit_vlans(int vlan_limit)
> >>>  flow_vlan_limit = MIN(vlan_limit, FLOW_MAX_VLAN_HEADERS);
> >>>  }
> >>>  }
> >>> +
> >>> +struct netdev *
> >>> +flow_get_tunnel_netdev(struct flow_tnl *tunnel)
> >>> +{
> >>> +char iface[IFNAMSIZ];
> >>> +struct in6_addr ip6;
> >>> +struct 

Re: [ovs-dev] [PATCH v8 3/3] revalidator: Rebalance offloaded flows based on the pps rate

2018-10-26 Thread Eelco Chaudron



On 25 Oct 2018, at 16:02, Sriharsha Basavapatna wrote:


Hi Eelco,

Thanks for your review comments. Please see my response below.
On Fri, Oct 19, 2018 at 7:53 PM Eelco Chaudron  
wrote:


On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:

This is the third patch in the patch-set to support dynamic 
rebalancing

of offloaded flows.

The dynamic rebalancing functionality is implemented in this patch. 
The
ukeys that are not scheduled for deletion are obtained and passed as 
input

to the rebalancing routine. The rebalancing is done in the context of
revalidation leader thread, after all other revalidator threads are
done with gathering rebalancing data for flows.

For each netdev that is in OOR state, a list of flows - both 
offloaded
and non-offloaded (pending) - is obtained using the ukeys. For each 
netdev
that is in OOR state, the flows are grouped and sorted into offloaded 
and

pending flows. The offloaded flows are sorted in descending order of
pps-rate, while pending flows are sorted in ascending order of 
pps-rate.


The rebalancing is done in two phases. In the first phase, we try to
offload all pending flows and if that succeeds, the OOR state on the 
device
is cleared. If some (or none) of the pending flows could not be 
offloaded,
then we start replacing an offloaded flow that has a lower pps-rate 
than
a pending flow, until there are no more pending flows with a higher 
rate
than an offloaded flow. The flows that are replaced from the device 
are

added into kernel datapath.

A new OVS configuration parameter "offload-rebalance", is added to 
ovsdb.

The default value of this is "false". To enable this feature, set the
value of this parameter to "true", which provides packets-per-second
rate based policy to dynamically offload and un-offload flows.

Note: This option can be enabled only when 'hw-offload' policy is 
enabled.

It also requires 'tc-policy' to be set to 'skip_sw'; otherwise, flow
offload errors (specifically ENOSPC error this feature depends on) 
reported

by an offloaded device are supressed by TC-Flower kernel module.

Signed-off-by: Sriharsha Basavapatna 


Co-authored-by: Venkat Duvvuru 
Signed-off-by: Venkat Duvvuru 
Reviewed-by: Sathya Perla 
Reviewed-by: Simon Horman 
Reviewed-by: Ben Pfaff 
---
NEWS | 3 +
lib/dpif-netdev.c | 3 +-
lib/dpif-netlink.c | 29 ++-
lib/dpif-provider.h | 8 +-
lib/dpif.c | 30 ++-
lib/dpif.h | 12 +-
lib/netdev-provider.h | 7 +-
lib/netdev.c | 62 -
lib/netdev.h | 1 +
ofproto/ofproto-dpif-upcall.c | 446 
+-

vswitchd/vswitch.xml | 21 ++
11 files changed, 592 insertions(+), 30 deletions(-)

diff --git a/NEWS b/NEWS
index 33b4d8a23..846b46fb5 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,9 @@ Post-v2.10.0
as the default timeout for control utilities.
- ovn:
* ovn-ctl: allow passing user:group ids to the OVN daemons.
+ - ovs-vswitchd:
+ * New configuration option "offload-rebalance", that enables 
dynamic

+ rebalancing of offloaded flows.


v2.10.0 - xx xxx 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 7c0300cc5..1c01d2278 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3689,7 +3689,8 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)

}

static void
-dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t 
n_ops)
+dpif_netdev_operate(struct dpif *dpif, struct dpif_op **ops, size_t 
n_ops,

+ enum dpif_offload_type offload_type OVS_UNUSED)
{
size_t i;

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index b9ce9cbe2..2e01f5750 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2281,7 +2281,8 @@ dpif_netlink_operate_chunks(struct dpif_netlink 
*dpif, struct dpif_op **ops,

}

static void
-dpif_netlink_operate(struct dpif *dpif_, struct dpif_op **ops, 
size_t n_ops)
+dpif_netlink_operate(struct dpif *dpif_, struct dpif_op **ops, 
size_t n_ops,

+ enum dpif_offload_type offload_type)
{
struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
struct dpif_op *new_ops[OPERATE_MAX_OPS];
@@ -2289,7 +2290,12 @@ dpif_netlink_operate(struct dpif *dpif_, 
struct dpif_op **ops, size_t n_ops)

int i = 0;
int err = 0;

- if (netdev_is_flow_api_enabled()) {
+ if (offload_type == DPIF_OFFLOAD_ALWAYS && 
!netdev_is_flow_api_enabled()) {

+ VLOG_DBG("Invalid offload_type: %d", offload_type);

Here we are not returning any errors just a silent return, should we 
return EINVAL instead?

The interface has no return value (void).


I mean in op->error



+ return;
+ }
+
+ if (offload_type != DPIF_OFFLOAD_NEVER && 
netdev_is_flow_api_enabled()) {

while (n_ops > 0) {
count = 0;

@@ -2298,6 +2304,23 @@ dpif_netlink_operate(struct dpif *dpif_, 
struct dpif_op **ops, size_t n_ops)


err = try_send_to_netdev(dpif, op);
if (err && err != EEXIST) {
+ if (offload_type == DPIF_OFFLOAD_ALWAYS) {
+ /* We got an error while offloading an op. Since
+ * OFFLOAD_ALWAYS is specified, we stop further
+ * processing and return to the caller without
+ * invoking kernel datapath 

Re: [ovs-dev] [PATCH v8 2/3] revalidator: Gather packets-per-second rate of flows

2018-10-26 Thread Eelco Chaudron



On 25 Oct 2018, at 16:01, Sriharsha Basavapatna wrote:


Hi Eelco,

Thanks for your comments, please see my response below.
On Fri, Oct 19, 2018 at 7:52 PM Eelco Chaudron  
wrote:


On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:


This is the second patch in the patch-set to support dynamic
rebalancing
of offloaded flows.

The packets-per-second (pps) rate for each flow is computed in the
context
of revalidator threads when the flow stats are retrieved. The 
pps-rate

is
computed only after a flow is revalidated and is not scheduled for
deletion. The parameters used to compute pps and the pps itself are
saved
in udpif_key since they need to be persisted across iterations of
rebalancing.


Was there a specific reason to go with pps and not bytes per second?
Asking as larger packets might need more copying around vs a lot of
small packets need more flow processing.


It's a good point, it was considered; the config parameter until 
patch-v4
was defined as a string with a value of "pps-rate" to enable 
packets-per-sec
based rebalancing. The other value that we wanted to provide was 
"bps-rate"
for bytes-per-second rate. But to keep it simple for now, it was 
changed

to a boolean.


Sound good for now, guess some testing can be done later to determine if 
bps might be needed.





Signed-off-by: Sriharsha Basavapatna

Co-authored-by: Venkat Duvvuru 
Signed-off-by: Venkat Duvvuru 
Reviewed-by: Sathya Perla 
Reviewed-by: Simon Horman 
Reviewed-by: Ben Pfaff 
---
 lib/dpif-provider.h   |  1 +
 ofproto/ofproto-dpif-upcall.c | 51
+++
 2 files changed, 52 insertions(+)

diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index 873b6e3d4..7a71f5c0a 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -39,6 +39,7 @@ struct dpif {
 char *full_name;
 uint8_t netflow_engine_type;
 uint8_t netflow_engine_id;
+long long int current_ms;
 };

 void dpif_init(struct dpif *, const struct dpif_class *, const char
*name,
diff --git a/ofproto/ofproto-dpif-upcall.c
b/ofproto/ofproto-dpif-upcall.c
index 6079f..a372d6252 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -42,6 +42,8 @@
 #include "tunnel.h"
 #include "unixctl.h"
 #include "openvswitch/vlog.h"
+#include "lib/dpif-provider.h"
+#include "lib/netdev-provider.h"

 #define MAX_QUEUE_LENGTH 512
 #define UPCALL_MAX_BATCH 64
@@ -304,6 +306,13 @@ struct udpif_key {

 uint32_t key_recirc_id;   /* Non-zero if reference is held by 
the

ukey. */
 struct recirc_refs recircs;  /* Action recirc IDs with 
references

held. */
+
+#define OFFL_REBAL_INTVL_MSEC  3000  /* dynamic offload rebalance 
freq

*/
+bool offloaded;  /* True if flow is offloaded 
*/

+uint64_t flow_pps_rate;  /* Packets-Per-Second rate */
+long long int flow_time; /* last pps update time */
+uint64_t flow_packets;   /* #pkts seen in interval */
+uint64_t flow_backlog_packets;   /* prev-mode #pkts (offl or
kernel) */
 };

 /* Datapath operation with optional ukey attached. */
@@ -1667,6 +1676,10 @@ ukey_create__(const struct nlattr *key, 
size_t

key_len,
 ukey->stats.used = used;
 ukey->xcache = NULL;

+ukey->offloaded = false;
+ukey->flow_time = 0;
+ukey->flow_packets = ukey->flow_backlog_packets = 0;
+


For clarity I think we need to set the flow_pps_rate to 0 also.


agreed



 ukey->key_recirc_id = key_recirc_id;
 recirc_refs_init(>recircs);
 if (xout) {
@@ -2442,6 +2455,39 @@ reval_op_init(struct ukey_op *op, enum
reval_result result,
 }
 }

+static uint64_t
+udpif_flow_packet_delta(struct udpif_key *ukey, const struct
dpif_flow *f)
+{
+return f->stats.n_packets + ukey->flow_backlog_packets -
+ukey->flow_packets;


It does not make sense to me why the flow_backlog_packets get added
here, and also to the flow_packets? See also below.


When we switch modes (kernel->offload or vice versa), the flow stat 
counters
(f->stats.n_packets) get reset and start from 0 again. This leads to 
wrong
computation of flow pps after switching modes. To avoid this, we keep 
track
of packets in the previous mode at the time of switching modes. Then 
in the
new mode, we start saving the packets seen so far at every iteration 
in
ukey->flow_packets, including the total packets seen in previous mode. 
So
while computing the packet delta, we need to account for the backlog 
packets
(since we'd have saved them in prev iteration in ukey->flow_packets); 
thus we

add flow->backlog_packets to f->stats.n_packets.



Why would we need to save these values? If the mode changes a new 
interval starts, so only the packets for this interval matter? Could 
this because you do not reset flow_packets when calling 
udpif_set_ukey_backlog_packets()?
It also might be that I’m missing somehting as it’s Friday :)  
Anyhow I think it does require some documentation in the code.





+}
+
+static long long 

Re: [ovs-dev] [PATCH v8 1/3] dpif-netlink: Detect Out-Of-Resource condition on a netdev

2018-10-26 Thread Eelco Chaudron




On 25 Oct 2018, at 16:00, Sriharsha Basavapatna wrote:


Hi Eelco,

Thanks for your comments, please see my response below.
On Fri, Oct 19, 2018 at 7:52 PM Eelco Chaudron  
wrote:


On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:


This is the first patch in the patch-set to support dynamic
rebalancing
of offloaded flows.

The patch detects OOR condition on a netdev port when ENOSPC error 
is

returned by TC-Flower while adding a flow rule. A new structure is
added
to the netdev called "netdev_hw_info", to store OOR related
information
required to perform dynamic offload-rebalancing.

Signed-off-by: Sriharsha Basavapatna

Co-authored-by: Venkat Duvvuru 
Signed-off-by: Venkat Duvvuru 
Reviewed-by: Sathya Perla 
Reviewed-by: Simon Horman 
Reviewed-by: Ben Pfaff 
---
 lib/dpif-netlink.c| 18 +-
 lib/flow.c| 25 +
 lib/flow.h|  1 +
 lib/netdev-provider.h | 11 +++
 lib/netdev.c  | 34 ++
 lib/netdev.h  |  3 +++
 6 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e6d5a6ec5..b9ce9cbe2 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2178,7 +2178,23 @@ parse_flow_put(struct dpif_netlink *dpif,
struct dpif_flow_put *put)

 VLOG_DBG("added flow");
 } else if (err != EEXIST) {
-VLOG_ERR_RL(, "failed to offload flow: %s",
ovs_strerror(err));
+struct netdev *oor_netdev = NULL;
+if (err == ENOSPC &&
netdev_is_offload_rebalance_policy_enabled()) {
+/*
+ * We need to set OOR on the input netdev (i.e, 'dev')
for the
+ * flow. But if the flow has a tunnel attribute (i.e,
decap action,
+ * with a virtual device like a VxLAN interface as its
in-port),
+ * then lookup and set OOR on the underlying tunnel
(real) netdev.
+ */
+oor_netdev = 
flow_get_tunnel_netdev();

+if (!oor_netdev) {
+/* Not a 'tunnel' flow */
+oor_netdev = dev;
+}
+netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true);


Why not just oor_netdev->hw_info.oor = true, see also below.


The original code was directly accessing netdev members. It was 
changed

based on a review comment to avoid direct access and add an interface.



I have a general comment, don't know where to put it, so I put it 
here.
Some hardware might have multiple tables. If one type of table is 
full
the ENOSPC might be returned, but it does not mean all type of flows 
can

no longer be offloaded. This might be a situation to think about.


Ok, thanks for bringing it up. Currently from OvS daemon's perspective 
a
request to add/delete a flow is issued on a netdev and the failure 
indicates
that the particular netdev is out of resources. If we need to handle 
the
condition where HW has different tables, we need to further extend 
this
design and the tc interfaces to propagate this fine grained 
information.


Would be good if other hardware vendors can comment here?




+}
+VLOG_ERR_RL(, "failed to offload flow: %s: %s",
ovs_strerror(err),
+(oor_netdev ? oor_netdev->name : dev->name));
 }

 out:
diff --git a/lib/flow.c b/lib/flow.c
index 77ed3d9df..a39807908 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,8 @@
 #include "unaligned.h"
 #include "util.h"
 #include "openvswitch/nsh.h"
+#include "ovs-router.h"
+#include "lib/netdev-provider.h"

 COVERAGE_DEFINE(flow_extract);
 COVERAGE_DEFINE(miniflow_malloc);
@@ -3403,3 +3406,25 @@ flow_limit_vlans(int vlan_limit)
 flow_vlan_limit = MIN(vlan_limit, FLOW_MAX_VLAN_HEADERS);
 }
 }
+
+struct netdev *
+flow_get_tunnel_netdev(struct flow_tnl *tunnel)
+{
+char iface[IFNAMSIZ];
+struct in6_addr ip6;
+struct in6_addr gw;
+
+if (tunnel->ip_src) {
+in6_addr_set_mapped_ipv4(, tunnel->ip_src);
+} else if (ipv6_addr_is_set(>ipv6_src)) {
+ip6 = tunnel->ipv6_src;
+} else {
+return NULL;
+}
+
+if (!ovs_router_lookup(0, , iface, NULL, )) {
+return NULL;
+}
+
+return netdev_from_name(iface);
+}
diff --git a/lib/flow.h b/lib/flow.h
index d03f1ba9c..aca60c41a 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -73,6 +73,7 @@ void flow_extract(struct dp_packet *, struct flow
*);
 void flow_zero_wildcards(struct flow *, const struct flow_wildcards
*);
 void flow_unwildcard_tp_ports(const struct flow *, struct
flow_wildcards *);
 void flow_get_metadata(const struct flow *, struct match
*flow_metadata);
+struct netdev *flow_get_tunnel_netdev(struct flow_tnl *tunnel);

 const char *ct_state_to_string(uint32_t state);
 uint32_t ct_state_from_string(const char *);
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 5a7947351..e320dad61 100644
---