[ovs-dev] Request for soft freeze exceptions for the below OVN patches.

2019-07-08 Thread Numan Siddique
Hi,

It would be great if the below OVN patches are considered during the soft
freeze window

 - ovn: Add a new logical switch port type - 'virtual' -
https://patchwork.ozlabs.org/patch/1128402/
 - ovn-northd: Add the option to pause and resume -
https://patchwork.ozlabs.org/patch/1129510/


Thanks
Numan
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovn-northd: Add the option to pause and resume

2019-07-08 Thread Numan Siddique
On Mon, Jul 8, 2019 at 11:22 PM Mark Michelson  wrote:

> Hi Numan,
>
> The code changes all look good, but there are a few problems with the
> documentation. I've noted them down below.
>
>
Oops. Lots of typos and mistakes.

Thanks for the review. I corrected them and submitted v3.

Thanks
Numan

On 7/8/19 11:41 AM, nusid...@redhat.com wrote:
> > From: Numan Siddique 
> >
> > This patch adds 3 unixctl socket comments - pause, resume and is-paused.
> >
> > Usage: ovs-appctl -t ovn-northd pause/resume/is-paused
> >
> > This feature will be useful if the CMS wants to
> >- deploy OVN DB servers in active/passive mode and
> >- run ovn-northd on all these nodes and use unix ctl sockets to
> >  connect to the local OVN DB servers.
> >
> > On the nodes where OVN Db ovsdb-servers are in passive mode, the local
> ovn-northds
> > will process the DB changes and calculate logical flows to be throw out
> later
> > because write txns are not allowed by these ovsdb-servers. It results in
> > unncessary CPU usage.
> >
> > With these commands, CMS can pause ovn-northd on these node. A node
> > which becomes master, can resume the ovn-northd.
> >
> > This feature will be useful in ovn-kubernetes if the above deployment
> model
> > is chosen.
> >
> > Signed-off-by: Numan Siddique 
> > ---
> > v1 -> v2
> > ===
> >* Addressed the review comments from Ben and add more documentation
> >  about the runtime options added by this patch.
> >* v1 had an issue - When paused, it was not even waking up to process
> >  the IDL updates. In v2, the main thread, wakes up to process any
> >  IDL updates, but doesn't do any logical flow computations.
> >
> >   ovn/northd/ovn-northd.8.xml |  48 +++
> >   ovn/northd/ovn-northd.c | 117 
> >   tests/ovn-northd.at |  38 
> >   3 files changed, 177 insertions(+), 26 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> > index e6417220f..0766902db 100644
> > --- a/ovn/northd/ovn-northd.8.xml
> > +++ b/ovn/northd/ovn-northd.8.xml
> > @@ -70,6 +70,23 @@
> > 
> >   Causes ovn-northd to gracefully terminate.
> > 
> > +
> > +  pause
> > +  
> > +Pauses the ovn-northd operation from processing any Northbound
> and
> > +Southbound database changes.
> > +  
> > +
> > +  resume
> > +  
> > +Resumes the ovn-northd operation to process Northbound and
> > +Southbound database contents and generate logical flows.
> > +  
> > +
> > +  is-paused
> > +  
> > +Returns "true" if ovn-northd is currently paused, "false"
> otherwise.
> > +  
> > 
> >   
> >
> > @@ -82,6 +99,37 @@
> > of ovn-northd will automatically take over.
> >   
> >
> > + Active-Standby with multiple OVN DB servers
> > +
> > +  You may run multiple OVN DB servers in an OVN deployment with:
> > +  
> > +
> > +  OVN DB servers deployed in active/passive mode with one active
> > +  and multiple passive ovsdb-servers.
> > +
> > +
> > +
> > +  ovn-northd also deployed on all thes nodes
>
> "thes" should either be "these" or "the". I'm not sure which you intended.
>
> > +  using unix ctl sockets to connect to the local OVN DB servers.
> > +
> > +  
> > +
> > +
> > +
> > +  In such deployments, the ovn-northds on the passive nodes will
> process
> > +  the DB changes and calculate logical flows to be throw out later
>
> s/throw/thrown/
>
> > +  because write txns are not allowed by the passive ovsdb-servers.
>
> I think writing out the word "transaction" is preferred over "txn" for
> documentation.
>
> > +  It results in unncessary CPU usage.
>
> s/unncessary/unnecessary/
>
> > +
> > +
> > +
> > +  With the help of runtime management command pause,
> you can
> > +  pause ovn-northd on these nodes. When a passive node
> > +  becomes master, you can use the runtime management command
> > +  resume to resume the ovn-northd to
> process the
> > +  DB changes.
> > +
> > +
> >   Logical Flow Table Structure
> >
> >   
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 0b0a96a3a..05ddd60e3 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -50,6 +50,9 @@
> >   VLOG_DEFINE_THIS_MODULE(ovn_northd);
> >
> >   static unixctl_cb_func ovn_northd_exit;
> > +static unixctl_cb_func ovn_northd_pause;
> > +static unixctl_cb_func ovn_northd_resume;
> > +static unixctl_cb_func ovn_northd_is_paused;
> >
> >   struct northd_context {
> >   struct ovsdb_idl *ovnnb_idl;
> > @@ -8639,6 +8642,7 @@ main(int argc, char *argv[])
> >   struct unixctl_server *unixctl;
> >   int retval;
> >   bool exiting;
> > +bool paused;
> >
> >   fatal_ignore_sigpipe();
> >   

[ovs-dev] [PATCH v3] ovn-northd: Add the option to pause and resume

2019-07-08 Thread nusiddiq
From: Numan Siddique 

This patch adds 3 unixctl socket comments - pause, resume and is-paused.

Usage: ovs-appctl -t ovn-northd pause/resume/is-paused

This feature will be useful if the CMS wants to
  - deploy OVN DB servers in active/passive mode and
  - run ovn-northd on all these nodes and use unix ctl sockets to
connect to the local OVN DB servers.

On the nodes where OVN Db ovsdb-servers are in passive mode, the local 
ovn-northds
will process the DB changes and compute logical flows to be thrown out later,
because write transactions are not allowed by these ovsdb-servers. It results in
unncessary CPU usage.

With these commands, CMS can pause ovn-northd on these node. A node
which becomes master, can resume the ovn-northd.

One use case is to use this feature in ovn-kubernetes with the above deployment 
model.

Signed-off-by: Numan Siddique 
---

v2 -> v3
===
  * Fixed the typos pointed out by Mark in ovn-northd.8.xml

v1 -> v2
===
  * Addressed the review comments from Ben and add more documentation
about the runtime options added by this patch.
  * v1 had an issue - When paused, it was not even waking up to process
the IDL updates. In v2, the main thread, wakes up to process any
IDL updates, but doesn't do any logical flow computations.


 ovn/northd/ovn-northd.8.xml |  48 +++
 ovn/northd/ovn-northd.c | 117 
 tests/ovn-northd.at |  38 
 3 files changed, 177 insertions(+), 26 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index d2267de0e..1d0243656 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -70,6 +70,23 @@
   
 Causes ovn-northd to gracefully terminate.
   
+
+  pause
+  
+Pauses the ovn-northd operation from processing any Northbound and
+Southbound database changes.
+  
+
+  resume
+  
+Resumes the ovn-northd operation to process Northbound and
+Southbound database contents and generate logical flows.
+  
+
+  is-paused
+  
+Returns "true" if ovn-northd is currently paused, "false" otherwise.
+  
   
 
 
@@ -82,6 +99,37 @@
   of ovn-northd will automatically take over.
 
 
+ Active-Standby with multiple OVN DB servers
+
+  You may run multiple OVN DB servers in an OVN deployment with:
+  
+
+  OVN DB servers deployed in active/passive mode with one active
+  and multiple passive ovsdb-servers.
+
+
+
+  ovn-northd also deployed on all these nodes,
+  using unix ctl sockets to connect to the local OVN DB servers.
+
+  
+
+
+
+  In such deployments, the ovn-northds on the passive nodes will process
+  the DB changes and compute logical flows to be thrown out later,
+  because write transactions are not allowed by the passive ovsdb-servers.
+  It results in unnecessary CPU usage.
+
+
+
+  With the help of runtime management command pause, you can
+  pause ovn-northd on these nodes. When a passive node
+  becomes master, you can use the runtime management command
+  resume to resume the ovn-northd to process the
+  DB changes.
+
+
 Logical Flow Table Structure
 
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index ce382ac89..50f4ebf99 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -50,6 +50,9 @@
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
 
 static unixctl_cb_func ovn_northd_exit;
+static unixctl_cb_func ovn_northd_pause;
+static unixctl_cb_func ovn_northd_resume;
+static unixctl_cb_func ovn_northd_is_paused;
 
 struct northd_context {
 struct ovsdb_idl *ovnnb_idl;
@@ -8710,6 +8713,7 @@ main(int argc, char *argv[])
 struct unixctl_server *unixctl;
 int retval;
 bool exiting;
+bool paused;
 
 fatal_ignore_sigpipe();
 ovs_cmdl_proctitle_init(argc, argv);
@@ -8724,6 +8728,10 @@ main(int argc, char *argv[])
 exit(EXIT_FAILURE);
 }
 unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, );
+unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, );
+unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, );
+unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused,
+ );
 
 daemonize_complete();
 
@@ -8880,32 +,49 @@ main(int argc, char *argv[])
 
 /* Main loop. */
 exiting = false;
+paused = false;
 while (!exiting) {
-struct northd_context ctx = {
-.ovnnb_idl = ovnnb_idl_loop.idl,
-.ovnnb_txn = ovsdb_idl_loop_run(_idl_loop),
-.ovnsb_idl = ovnsb_idl_loop.idl,
-.ovnsb_txn = ovsdb_idl_loop_run(_idl_loop),
-.sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
-};
-
-if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-

Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

2019-07-08 Thread Gregory Rose



On 7/8/2019 4:18 PM, Gregory Rose wrote:

On 7/8/2019 4:08 PM, David Miller wrote:

From: Taehee Yoo 
Date: Sat,  6 Jul 2019 01:08:09 +0900


When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.

Signed-off-by: Taehee Yoo 
I'm not so sure about the logic here and I'd therefore like an OVS 
expert

to review this.


I'll review and test it and get back.  Pravin may have input as well.



Err, adding Pravin.

- Greg


Thanks,

- Greg


Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

2019-07-08 Thread Gregory Rose

On 7/8/2019 4:08 PM, David Miller wrote:

From: Taehee Yoo 
Date: Sat,  6 Jul 2019 01:08:09 +0900


When a vport is deleted, the maximum headroom size would be changed.
If the vport which has the largest headroom is deleted,
the new max_headroom would be set.
But, if the new headroom size is equal to the old headroom size,
updating routine is unnecessary.

Signed-off-by: Taehee Yoo 

I'm not so sure about the logic here and I'd therefore like an OVS expert
to review this.


I'll review and test it and get back.  Pravin may have input as well.

Thanks,

- Greg


Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: do not update max_headroom if new headroom is equal to old headroom

2019-07-08 Thread David Miller
From: Taehee Yoo 
Date: Sat,  6 Jul 2019 01:08:09 +0900

> When a vport is deleted, the maximum headroom size would be changed.
> If the vport which has the largest headroom is deleted,
> the new max_headroom would be set.
> But, if the new headroom size is equal to the old headroom size,
> updating routine is unnecessary.
> 
> Signed-off-by: Taehee Yoo 

I'm not so sure about the logic here and I'd therefore like an OVS expert
to review this.

Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] net: openvswitch: use netif_ovs_is_port() instead of opencode

2019-07-08 Thread David Miller
From: Taehee Yoo 
Date: Sat,  6 Jul 2019 01:05:46 +0900

> Use netif_ovs_is_port() function instead of open code.
> This patch doesn't change logic.
> 
> Signed-off-by: Taehee Yoo 

Applied.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 0/3] vhost tx retry updates

2019-07-08 Thread Kevin Traynor
On 08/07/2019 13:45, Ian Stokes wrote:
> On 7/2/2019 1:32 AM, Kevin Traynor wrote:
>> v4:
>> - 1/2 New patch: Move vhost tx retries doc to a seperate section (David)
>>
>> - 2/3
>> -- Changed tx_retries to be a custom stat for vhost (Ilya)
>> -- Added back in MIN() that was dropped in v2, as in retesting I
>> saw it is needed when the retry limit is reached to prevent
>> an accounting error
>> -- note: this is not a typo, it was just out of date
>>  -/* 44 pad bytes here. */
>>  +/* 4 pad bytes here. */
>> -- Removed acks due to the amount of changes made
>>
> 
> Thanks all for the work reviewing this. I've made the final amendments 
> suggested and pushed to master.
> 

Thanks to everyone for the comments/suggestions and thanks Ian for
making those last few edits.

Kevin.

> Thanks
> Ian
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: Preserve ofport number for failed ofport across restarts

2019-07-08 Thread Ben Pfaff
On Thu, Jun 13, 2019 at 08:28:02AM +, Vishal Deep Ajmera wrote:
> 
> > 
> > This patch fixes both the above issue by retaining the ofport number till 
> > ofport
> > entry exists in the OVSDB.
> > 
> > Warm Regards,
> > Vishal Ajmera
> 
> Hi Ben,
> 
> Did you get a chance to review this patch ?

I've been taking a look at this patch for the last few minutes.  It
introduces a lot of mechanism for the use case.  Did you consider any
simpler mechanisms to achieve the same effect?  What prevented them from
working?

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch v5] conntrack: Optimize recirculations.

2019-07-08 Thread Ben Pfaff
On Sun, Jun 09, 2019 at 07:32:03AM -0700, Darrell Ball wrote:
> Cache the 'conn' context and use it when it is valid.  The cached 'conn'
> context will get reset if it is not expected to be valid; a negative test
> is added to check this.
> 
> Signed-off-by: Darrell Ball 
> ---
> 
> v5: Check for alg ctl on recirculation to handle potential corner case.
> Remove unnecessary 'commit' filter.
> 
> v4: Reset 'conn' cache context automatically when tuple changes, rather than
> needing an explicit 'ct_clear' action.  Need to check if all cases are
> handled.
> v3: Remove unneeded 'NAT' field added to 'struct conn_lookup_ctx'.

It looks like this got missed for review.  Oops.

This is nice!  I guess RCU makes it safe to keep a "struct conn *"
pointer around without worrying about it too much?

How much time does this save?  Do you have an idea where the savings
come from?  I have two motivations for asking:

   1. Do the savings warrant the (slight) complexity and (some) risk?

   2. I think that the cost that this saves can be divided into
  extraction and hash table lookup.  If the hash table lookup is the
  expensive part, then we could skip the work here on invalidating
  when the packet changes, instead always doing extraction and then
  comparing against the cache entry.

Your thoughts?

The 'conn' member can be declared as type "struct conn *" rather than
"void *", for slightly better safety.

Please don't mark functions in .c files as 'inline' without some
exceptional reason.  coding-style.rst says why:

Functions in ``.c`` files should not normally be marked ``inline``,
because it does not usually help code generation and it does
suppress compiler warnings about unused functions. (Functions
defined in ``.h`` usually should be marked ``inline``.)

All the "continue;"s in conntrack_execute() start to look odd.  Maybe
like this instead?

if (OVS_UNLIKELY(packet->md.ct_state == CS_INVALID)) {
write_ct_md(packet, zone, NULL, NULL, NULL);
} else if (conn && conn->key.zone == zone && !force
   && !get_alg_ctl_type(packet, tp_src, tp_dst, helper)) {
process_one_fast(zone, setmark, setlabel, nat_action_info,
 conn, packet);
} else if (OVS_UNLIKELY(!conn_key_extract(ct, packet, dl_type, ,
zone))) {
packet->md.ct_state = CS_INVALID;
write_ct_md(packet, zone, NULL, NULL, NULL);
} else {
process_one(ct, packet, , zone, force, commit, now, setmark,
setlabel, nat_action_info, tp_src, tp_dst, helper);
}

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv14 2/2] netdev-afxdp: add new netdev type for AF_XDP.

2019-07-08 Thread William Tu
On Mon, Jul 8, 2019 at 1:50 PM William Tu  wrote:
>
> Hi Ilya,
>
> Thanks for all the feedback!
>
> On Fri, Jul 5, 2019 at 8:32 AM Ilya Maximets  wrote:
> >
> > On 01.07.2019 19:08, William Tu wrote:
> > > The patch introduces experimental AF_XDP support for OVS netdev.
> > > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > > type built upon the eBPF and XDP technology.  It is aims to have 
> > > comparable
> > > performance to DPDK but cooperate better with existing kernel's networking
> > > stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP 
> > > program
> > > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > > For more details about AF_XDP, please see linux kernel's
> > > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > > not compiled in.
> > >
> > > Signed-off-by: William Tu 
> > > ---
> > >
> > > v13-v14
> > > - Mainly address issue reported by Ilya
> > >   
> > > https://protect2.fireeye.com/url?k=a0e04e091a64b944.a0e1c546-0896b5863118ce25=https://patchwork.ozlabs.org/patch/1118972/
> > >   when doing 'make check-afxdp'
> > > - Fix xdp frame headroom issue
> > > - Fix vlan test cases by disabling txvlan offload
> > > - Skip cvlan
> > > - Document TCP limitation (currently all tcp tests fail due to
> > >   kernel veth driver)
> >
> > Maybe it's better to skip them too? The easy way could be:
> >
> > diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> > index 5ee2ceb1a..f0683c0a9 100644
> > --- a/tests/system-afxdp-macros.at
> > +++ b/tests/system-afxdp-macros.at
> > @@ -30,3 +30,10 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
> >   AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> >  ]
> >  )
> > +
> > +# OVS_START_L7([namespace], [protocol])
> > +#
> > +# AF_XDP doesn't work with TCP over virtual interfaces for now.
> > +#
> > +m4_define([OVS_START_L7],
> > +   [AT_SKIP_IF([:])])
> > ---
> >
> > BTW, documentation should stay.
> >
> OK Will do it!
>
> I rebase to master, added the above and run
> make check-afxdp TESTSUITEFLAGS='-v -x 2'
> it hangs at
> ip link del ovs-$1
I think it's related to your pending patch to kernel about
"Fix hang while unregistering device bound to xdp socket"
https://lkml.org/lkml/2019/6/27/255

I will try and see if it fixes.
Thanks
William

> I have to remove it to make it works
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -15,7 +15,6 @@ m4_define([ADD_VETH],
>if test -n "$6"; then
>  NS_CHECK_EXEC([$2], [ip route add default via $6])
>fi
> -  on_exit 'ip link del ovs-$1'
>  ]
>  )
> I don't know why, but in the end
> ip netns del at_ns0
> clean up the ovs-p0 device
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] OVN: run local logical flows first in S_ROUTER_OUT_SNAT table

2019-07-08 Thread Ben Pfaff
On Sat, Jul 06, 2019 at 12:45:00PM +0200, Lorenzo Bianconi wrote:
> Run local logical flows first if the gw router port is scheduled
> on the local chassis in order to properly manage snat traffic
> 
> Tested-by: Eran Kuris 
> Acked-by: Numan Siddique 
> Signed-off-by: Lorenzo Bianconi 
> ---
> Changes since v2:
> - fix compilation error
> Changes since v1:
> - add priority change in ovn-northd.8.xml

Thanks, applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] rhel: Fixed a bug for checking the correct major version and revision.

2019-07-08 Thread Ben Pfaff
On Mon, Jul 08, 2019 at 09:51:29AM -0700, Ashish Varma wrote:
> Fixed a bug where checking for major version 3.10 and major revision not
> equal to 327 or 693 or 957 should have gone to the default else at the end.
> In the current code, the default else condition will not get executed
> for kernel with major version 3.10 and major revision not equal
> to 327/693/957 resulting in failure to load the kernel module.
> 
> Fixes: 402efbe4e176 ("rhel: Add 4.12 kernel support in ovs-kmod-manage.sh")
> Signed-off-by: Ashish Varma 
> ---
> v1-v2:
> Added the Fixes tag.

I applied this to master.  I don't think it needs backporting but please
correct me if I am wrong.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/3] ovn-controller: Fix and refactor chassis ovn-sbdb record init

2019-07-08 Thread Ben Pfaff
On Mon, Jul 08, 2019 at 12:06:45PM +0200, Dumitru Ceara wrote:
> The chassis_run code didn't take into account the scenario when the
> system-id was changed in the Open_vSwitch table. Due to this the code
> was trying to insert a new Chassis record in the OVN_Southbound DB with
> the same Encaps as the previous Chassis record. The transaction used
> to insert the new records was aborting due to the ["type", "ip"]
> index constraint violation as we were creating new Encap entries with
> the same "type" and "ip" as the old ones.

Thanks.  I applied this series to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv14 2/2] netdev-afxdp: add new netdev type for AF_XDP.

2019-07-08 Thread William Tu
On Fri, Jul 5, 2019 at 10:48 AM Ilya Maximets  wrote:
>
> Few more comments inline.
>
> On 01.07.2019 19:08, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > type built upon the eBPF and XDP technology.  It is aims to have comparable
> > performance to DPDK but cooperate better with existing kernel's networking
> > stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > For more details about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > not compiled in.
> >
> > Signed-off-by: William Tu 
> > ---
>
> 
>
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index 0976a35e758b..cc746b70af89 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -19,6 +19,7 @@
> >  #include 
> >
> >  #include "dp-packet.h"
> > +#include "netdev-afxdp.h"
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/dynamic-string.h"
> >  #include "util.h"
> > @@ -59,6 +60,28 @@ dp_packet_use(struct dp_packet *b, void *base, size_t 
> > allocated)
> >  dp_packet_use__(b, base, allocated, DPBUF_MALLOC);
> >  }
> >
> > +#if HAVE_AF_XDP
> > +/* Initialize 'b' as an empty dp_packet that contains
> > + * memory starting at AF_XDP umem base.
> > + */
> > +void
> > +dp_packet_use_afxdp(struct dp_packet *b, void *data, size_t allocated,
> > +size_t headroom)
> > +{
> > +dp_packet_set_base(b, (char *)data - headroom);
> > +dp_packet_set_data(b, data);
> > +dp_packet_set_size(b, 0);
> > +
> > +dp_packet_set_allocated(b, allocated);
> > +b->source = DPBUF_AFXDP;
> > +dp_packet_reset_offsets(b);
> > +pkt_metadata_init(>md, 0);
> > +dp_packet_reset_cutlen(b);
> > +dp_packet_reset_offload(b);
> > +b->packet_type = htonl(PT_ETH);
>
> Above block could be replaced with:
>
> dp_packet_init__(b, allocated, DPBUF_AFXDP);
>
> At least it misses init_specific().
>

Yes, thank you.

> > +}
> > +#endif
> > +
> >  /* Initializes 'b' as an empty dp_packet that contains the 'allocated' 
> > bytes of
> >   * memory starting at 'base'.  'base' should point to a buffer on the 
> > stack.
> >   * (Nothing actually relies on 'base' being allocated on the stack.  It 
> > could
> > @@ -122,6 +145,8 @@ dp_packet_uninit(struct dp_packet *b)
> >   * created as a dp_packet */
> >  free_dpdk_buf((struct dp_packet*) b);
> >  #endif
> > +} else if (b->source == DPBUF_AFXDP) {
> > +free_afxdp_buf(b);
> >  }
> >  }
> >  }
>
> 
>
> > +int
> > +netdev_afxdp_get_stats(const struct netdev *netdev,
> > +   struct netdev_stats *stats)
> > +{
> > +struct netdev_linux *dev = netdev_linux_cast(netdev);
> > +struct xsk_socket_info *xsk_info;
> > +struct netdev_stats dev_stats;
> > +int error, i;
> > +
> > +ovs_mutex_lock(>mutex);
> > +
> > +error = get_stats_via_netlink(netdev, _stats);
> > +if (error) {
> > +VLOG_WARN_RL(, "Error getting AF_XDP statistics");
> > +} else {
> > +/* Use kernel netdev's packet and byte counts */
> > +stats->rx_packets = dev_stats.rx_packets;
> > +stats->rx_bytes = dev_stats.rx_bytes;
> > +stats->tx_packets = dev_stats.tx_packets;
> > +stats->tx_bytes = dev_stats.tx_bytes;
> > +
> > +stats->rx_errors   += dev_stats.rx_errors;
> > +stats->tx_errors   += dev_stats.tx_errors;
> > +stats->rx_dropped  += dev_stats.rx_dropped;
> > +stats->tx_dropped  += dev_stats.tx_dropped;
> > +stats->multicast   += dev_stats.multicast;
> > +stats->collisions  += dev_stats.collisions;
> > +stats->rx_length_errors+= dev_stats.rx_length_errors;
> > +stats->rx_over_errors  += dev_stats.rx_over_errors;
> > +stats->rx_crc_errors   += dev_stats.rx_crc_errors;
> > +stats->rx_frame_errors += dev_stats.rx_frame_errors;
> > +stats->rx_fifo_errors  += dev_stats.rx_fifo_errors;
> > +stats->rx_missed_errors+= dev_stats.rx_missed_errors;
> > +stats->tx_aborted_errors   += dev_stats.tx_aborted_errors;
> > +stats->tx_carrier_errors   += dev_stats.tx_carrier_errors;
> > +stats->tx_fifo_errors  += dev_stats.tx_fifo_errors;
> > +stats->tx_heartbeat_errors += dev_stats.tx_heartbeat_errors;
> > +stats->tx_window_errors+= dev_stats.tx_window_errors;
> > +
> > +/* Account the dropped in each xsk */
> > +for (i = 0; i < netdev_n_rxq(netdev); i++) {
> > +xsk_info = dev->xsks[i];
> > +if (xsk_info) {
> > +stats->rx_dropped += 

Re: [ovs-dev] [PATCHv14 2/2] netdev-afxdp: add new netdev type for AF_XDP.

2019-07-08 Thread William Tu
Hi Ilya,

Thanks for all the feedback!

On Fri, Jul 5, 2019 at 8:32 AM Ilya Maximets  wrote:
>
> On 01.07.2019 19:08, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > type built upon the eBPF and XDP technology.  It is aims to have comparable
> > performance to DPDK but cooperate better with existing kernel's networking
> > stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > For more details about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > not compiled in.
> >
> > Signed-off-by: William Tu 
> > ---
> >
> > v13-v14
> > - Mainly address issue reported by Ilya
> >   
> > https://protect2.fireeye.com/url?k=a0e04e091a64b944.a0e1c546-0896b5863118ce25=https://patchwork.ozlabs.org/patch/1118972/
> >   when doing 'make check-afxdp'
> > - Fix xdp frame headroom issue
> > - Fix vlan test cases by disabling txvlan offload
> > - Skip cvlan
> > - Document TCP limitation (currently all tcp tests fail due to
> >   kernel veth driver)
>
> Maybe it's better to skip them too? The easy way could be:
>
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> index 5ee2ceb1a..f0683c0a9 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,3 +30,10 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>   AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
>  ]
>  )
> +
> +# OVS_START_L7([namespace], [protocol])
> +#
> +# AF_XDP doesn't work with TCP over virtual interfaces for now.
> +#
> +m4_define([OVS_START_L7],
> +   [AT_SKIP_IF([:])])
> ---
>
> BTW, documentation should stay.
>
OK Will do it!

I rebase to master, added the above and run
make check-afxdp TESTSUITEFLAGS='-v -x 2'
it hangs at
ip link del ovs-$1
I have to remove it to make it works
--- a/tests/system-afxdp-macros.at
+++ b/tests/system-afxdp-macros.at
@@ -15,7 +15,6 @@ m4_define([ADD_VETH],
   if test -n "$6"; then
 NS_CHECK_EXEC([$2], [ip route add default via $6])
   fi
-  on_exit 'ip link del ovs-$1'
 ]
 )
I don't know why, but in the end
ip netns del at_ns0
clean up the ovs-p0 device

> > - Fix tunnel test cases due to --disable-system (another patch)
> > - Switch to use pthread_spin_lock, suggested by Ben
> > - Add coverage counter for debugging
> > - Fix buffer starvation issue at batch_send reported by Eelco
> >   when using tap device with type=afxdp
> > - TODO:
> >   'make check-afxdp' still not all pass
> >   IP fragmentation expiry test not fix yet, need to implement
> >   deferral memory free, s.t like dpdk_mp_sweep.  Currently hit
> >   some missing umem descs when reclaiming.
> > ---
>
> Instead of inline comments I'll suggest incremental patch (below) with
> following explanations:
>
> * You still need to return EAGAIN in rxq_recv if socket is not ready.
>   Otherwise upper layers will count the call as successful.
>
> * sendto() in SKB mode will send at most 16 packets, but our batch might
>   be up to 32 packets. If batch size will be larger than 16, TX ring will
>   be quickly exhausted and transmission will be blocked. We have to kick
>   the kernel up to n_packets / 16 times to be sure that all packets are
>   transmitted. This fixes my issues with zero traffic in test with iperf
>   between 2 namespaces while more than 2 traffic flows involved.
>
Thanks for getting down to the root cause!

> * EAGAIN is very likely case for sendto.
>
> * Suggesting to reclaim all CONS_NUM_DESCS addresses from the completion
>   queue each time instead of BATCH_SIZE addresses. This will guarantee
>   that we'll have enough cq entries for all cases. This is important
>   because we may have much more than BATCH_SIZE packets in outstanding_tx.

Good point, thanks.

>
> * No need to kick_tx inside afxdp_complete_tx since we'll retry more times
>   while kicking on transmit.
>
> * Suggesting warning on exhausted memory pool. This is a bad case and should
>   be reported.
>
> * 'concurrent_txq' is a likely case, because we can't configure tx queues
>   separately from rx queues, so we'll likely have number of tx queues less
>   than number of threads --> 'concurrent_txq' will likely be true.
>   I think that we need to just remove 'OVS_UNLIKELY' macro there.
>
> * The worst case for the mpool size is larger than just size of the rings.
>   There might be more packets currently under processing in OVS, stored
>   in output batches in datapath. It's hard to estimate the good size.
>   Suggesting just to increase two times for now.

Thanks, I saw your comments above the code.


>
>
> Few additional questions/requests:
>
> * Why we need dev->xdp_flags and dev->xdp_bind_flags? These are not used and
>   mostly just 

[ovs-dev] Evolution de carrière

2019-07-08 Thread Les étapes clefs
Les Guides Du Digital
Guide Bilan de competences
Vous souhaitez faire le point sur vos competences pour envisager une evolution 
de carriere ?
Vous lancer dans ce processus de reflexion personnelle et professionnelle 
demande une certaine preparation.
Notre guide de 22 pages vous eclaire sur toutes les etapes clefs dun 
bilan de competences version 2019.

TeLeCHARGER
Illustration Bilan de competences
Plus de reception de messages sur loffre Guide du Social Selling envoyes 
par ASCPM, cliquez ici
Vous disposez dun droit dacces, de rectification, dopposition 
et de consentement auquel
vous avez acces a partir de cette page web : Charte des Donnees Personnelles.
Vous recevez ce message car vous etes present dans la base ASCPM composee de 
dirigeants et professionnels. 
ASCPM - 5 Avenue du General de Gaulle - 94160 Saint Mande - France - R.C.S. 814 
073 060 - CRETEIL

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] OVN: run local logical flows first in S_ROUTER_OUT_SNAT table

2019-07-08 Thread Mark Michelson

Acked-by: Mark Michelson 

On 7/6/19 6:45 AM, Lorenzo Bianconi wrote:

Run local logical flows first if the gw router port is scheduled
on the local chassis in order to properly manage snat traffic

Tested-by: Eran Kuris 
Acked-by: Numan Siddique 
Signed-off-by: Lorenzo Bianconi 
---
Changes since v2:
- fix compilation error
Changes since v1:
- add priority change in ovn-northd.8.xml
---
  ovn/northd/ovn-northd.8.xml | 3 ++-
  ovn/northd/ovn-northd.c | 7 +--
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 193aa210f..d2267de0e 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -2428,7 +2428,8 @@ nd_ns {
  
If the NAT rule cannot be handled in a distributed manner, then
the flow above is only programmed on the
-  redirect-chassis.
+  redirect-chassis increasing flow priority by 128 in
+  order to be run first
  
  
  

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index ba2719321..ce382ac89 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -6634,6 +6634,8 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
count_1bits(ntohl(mask)) + 1,
ds_cstr(), ds_cstr());
  } else {
+uint16_t priority = count_1bits(ntohl(mask)) + 1;
+
  /* Distributed router. */
  ds_clear();
  ds_put_format(, "ip && ip4.src == %s"
@@ -6643,6 +6645,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
  if (!distributed && od->l3redirect_port) {
  /* Flows for NAT rules that are centralized are only
   * programmed on the "redirect-chassis". */
+priority += 128;
  ds_put_format(, " && is_chassis_resident(%s)",
od->l3redirect_port->json_key);
  }
@@ -6657,8 +6660,8 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap 
*ports,
   * nat->logical_ip with the longest mask gets a higher
   * priority. */
  ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT,
-  count_1bits(ntohl(mask)) + 1,
-  ds_cstr(), ds_cstr());
+  priority, ds_cstr(),
+  ds_cstr());
  }
  }
  



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ovn-northd: Add the option to pause and resume

2019-07-08 Thread Mark Michelson

Hi Numan,

The code changes all look good, but there are a few problems with the 
documentation. I've noted them down below.


On 7/8/19 11:41 AM, nusid...@redhat.com wrote:

From: Numan Siddique 

This patch adds 3 unixctl socket comments - pause, resume and is-paused.

Usage: ovs-appctl -t ovn-northd pause/resume/is-paused

This feature will be useful if the CMS wants to
   - deploy OVN DB servers in active/passive mode and
   - run ovn-northd on all these nodes and use unix ctl sockets to
 connect to the local OVN DB servers.

On the nodes where OVN Db ovsdb-servers are in passive mode, the local 
ovn-northds
will process the DB changes and calculate logical flows to be throw out later
because write txns are not allowed by these ovsdb-servers. It results in
unncessary CPU usage.

With these commands, CMS can pause ovn-northd on these node. A node
which becomes master, can resume the ovn-northd.

This feature will be useful in ovn-kubernetes if the above deployment model
is chosen.

Signed-off-by: Numan Siddique 
---
v1 -> v2
===
   * Addressed the review comments from Ben and add more documentation
 about the runtime options added by this patch.
   * v1 had an issue - When paused, it was not even waking up to process
 the IDL updates. In v2, the main thread, wakes up to process any
 IDL updates, but doesn't do any logical flow computations.

  ovn/northd/ovn-northd.8.xml |  48 +++
  ovn/northd/ovn-northd.c | 117 
  tests/ovn-northd.at |  38 
  3 files changed, 177 insertions(+), 26 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index e6417220f..0766902db 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -70,6 +70,23 @@

  Causes ovn-northd to gracefully terminate.

+
+  pause
+  
+Pauses the ovn-northd operation from processing any Northbound and
+Southbound database changes.
+  
+
+  resume
+  
+Resumes the ovn-northd operation to process Northbound and
+Southbound database contents and generate logical flows.
+  
+
+  is-paused
+  
+Returns "true" if ovn-northd is currently paused, "false" otherwise.
+  

  
  
@@ -82,6 +99,37 @@

of ovn-northd will automatically take over.
  
  
+ Active-Standby with multiple OVN DB servers

+
+  You may run multiple OVN DB servers in an OVN deployment with:
+  
+
+  OVN DB servers deployed in active/passive mode with one active
+  and multiple passive ovsdb-servers.
+
+
+
+  ovn-northd also deployed on all thes nodes


"thes" should either be "these" or "the". I'm not sure which you intended.


+  using unix ctl sockets to connect to the local OVN DB servers.
+
+  
+
+
+
+  In such deployments, the ovn-northds on the passive nodes will process
+  the DB changes and calculate logical flows to be throw out later


s/throw/thrown/


+  because write txns are not allowed by the passive ovsdb-servers.


I think writing out the word "transaction" is preferred over "txn" for 
documentation.



+  It results in unncessary CPU usage.


s/unncessary/unnecessary/


+
+
+
+  With the help of runtime management command pause, you can
+  pause ovn-northd on these nodes. When a passive node
+  becomes master, you can use the runtime management command
+  resume to resume the ovn-northd to process the
+  DB changes.
+
+
  Logical Flow Table Structure
  
  

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0b0a96a3a..05ddd60e3 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -50,6 +50,9 @@
  VLOG_DEFINE_THIS_MODULE(ovn_northd);
  
  static unixctl_cb_func ovn_northd_exit;

+static unixctl_cb_func ovn_northd_pause;
+static unixctl_cb_func ovn_northd_resume;
+static unixctl_cb_func ovn_northd_is_paused;
  
  struct northd_context {

  struct ovsdb_idl *ovnnb_idl;
@@ -8639,6 +8642,7 @@ main(int argc, char *argv[])
  struct unixctl_server *unixctl;
  int retval;
  bool exiting;
+bool paused;
  
  fatal_ignore_sigpipe();

  ovs_cmdl_proctitle_init(argc, argv);
@@ -8653,6 +8657,10 @@ main(int argc, char *argv[])
  exit(EXIT_FAILURE);
  }
  unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, );
+unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, );
+unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, );
+unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused,
+ );
  
  daemonize_complete();
  
@@ -8809,32 +8817,49 @@ main(int argc, char *argv[])
  
  /* Main loop. */

  exiting = false;
+paused = false;
  while (!exiting) {
-struct northd_context ctx = {
-

[ovs-dev] [PATCH v2] rhel: Fixed a bug for checking the correct major version and revision.

2019-07-08 Thread Ashish Varma
Fixed a bug where checking for major version 3.10 and major revision not
equal to 327 or 693 or 957 should have gone to the default else at the end.
In the current code, the default else condition will not get executed
for kernel with major version 3.10 and major revision not equal
to 327/693/957 resulting in failure to load the kernel module.

Fixes: 402efbe4e176 ("rhel: Add 4.12 kernel support in ovs-kmod-manage.sh")
Signed-off-by: Ashish Varma 
---
v1-v2:
Added the Fixes tag.
---
 rhel/openvswitch-kmod-fedora.spec.in | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/rhel/openvswitch-kmod-fedora.spec.in 
b/rhel/openvswitch-kmod-fedora.spec.in
index 92d763f..27f443a 100644
--- a/rhel/openvswitch-kmod-fedora.spec.in
+++ b/rhel/openvswitch-kmod-fedora.spec.in
@@ -90,13 +90,12 @@ if grep -qs "suse" /etc/os-release; then
 if [ -x "%{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh" ]; then
 %{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh
 fi
-elif [ "$mainline_major" = "3" ] && [ "$mainline_minor" = "10" ]; then
-if [ "$major_rev" = "327" ] || [ "$major_rev" = "693" ] || \
-   [ "$major_rev" = "957" ]; then
-# For RHEL 7.2, 7.4 and 7.6
-if [ -x "%{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh" ]; then
-%{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh
-fi
+elif [ "$mainline_major" = "3" ] && [ "$mainline_minor" = "10" ] &&
+ { [ "$major_rev" = "327" ] || [ "$major_rev" = "693" ] || \
+   [ "$major_rev" = "957" ]; }; then
+# For RHEL 7.2, 7.4 and 7.6
+if [ -x "%{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh" ]; then
+%{_datadir}/openvswitch/scripts/ovs-kmod-manage.sh
 fi
 else
 # Ensure that modprobe will find our modules.
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] ovn-northd: Add the option to pause and resume

2019-07-08 Thread nusiddiq
From: Numan Siddique 

This patch adds 3 unixctl socket comments - pause, resume and is-paused.

Usage: ovs-appctl -t ovn-northd pause/resume/is-paused

This feature will be useful if the CMS wants to
  - deploy OVN DB servers in active/passive mode and
  - run ovn-northd on all these nodes and use unix ctl sockets to
connect to the local OVN DB servers.

On the nodes where OVN Db ovsdb-servers are in passive mode, the local 
ovn-northds
will process the DB changes and calculate logical flows to be throw out later
because write txns are not allowed by these ovsdb-servers. It results in
unncessary CPU usage.

With these commands, CMS can pause ovn-northd on these node. A node
which becomes master, can resume the ovn-northd.

This feature will be useful in ovn-kubernetes if the above deployment model
is chosen.

Signed-off-by: Numan Siddique 
---
v1 -> v2
===
  * Addressed the review comments from Ben and add more documentation
about the runtime options added by this patch.
  * v1 had an issue - When paused, it was not even waking up to process
the IDL updates. In v2, the main thread, wakes up to process any
IDL updates, but doesn't do any logical flow computations.

 ovn/northd/ovn-northd.8.xml |  48 +++
 ovn/northd/ovn-northd.c | 117 
 tests/ovn-northd.at |  38 
 3 files changed, 177 insertions(+), 26 deletions(-)

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index e6417220f..0766902db 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -70,6 +70,23 @@
   
 Causes ovn-northd to gracefully terminate.
   
+
+  pause
+  
+Pauses the ovn-northd operation from processing any Northbound and
+Southbound database changes.
+  
+
+  resume
+  
+Resumes the ovn-northd operation to process Northbound and
+Southbound database contents and generate logical flows.
+  
+
+  is-paused
+  
+Returns "true" if ovn-northd is currently paused, "false" otherwise.
+  
   
 
 
@@ -82,6 +99,37 @@
   of ovn-northd will automatically take over.
 
 
+ Active-Standby with multiple OVN DB servers
+
+  You may run multiple OVN DB servers in an OVN deployment with:
+  
+
+  OVN DB servers deployed in active/passive mode with one active
+  and multiple passive ovsdb-servers.
+
+
+
+  ovn-northd also deployed on all thes nodes
+  using unix ctl sockets to connect to the local OVN DB servers.
+
+  
+
+
+
+  In such deployments, the ovn-northds on the passive nodes will process
+  the DB changes and calculate logical flows to be throw out later
+  because write txns are not allowed by the passive ovsdb-servers.
+  It results in unncessary CPU usage.
+
+
+
+  With the help of runtime management command pause, you can
+  pause ovn-northd on these nodes. When a passive node
+  becomes master, you can use the runtime management command
+  resume to resume the ovn-northd to process the
+  DB changes.
+
+
 Logical Flow Table Structure
 
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0b0a96a3a..05ddd60e3 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -50,6 +50,9 @@
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
 
 static unixctl_cb_func ovn_northd_exit;
+static unixctl_cb_func ovn_northd_pause;
+static unixctl_cb_func ovn_northd_resume;
+static unixctl_cb_func ovn_northd_is_paused;
 
 struct northd_context {
 struct ovsdb_idl *ovnnb_idl;
@@ -8639,6 +8642,7 @@ main(int argc, char *argv[])
 struct unixctl_server *unixctl;
 int retval;
 bool exiting;
+bool paused;
 
 fatal_ignore_sigpipe();
 ovs_cmdl_proctitle_init(argc, argv);
@@ -8653,6 +8657,10 @@ main(int argc, char *argv[])
 exit(EXIT_FAILURE);
 }
 unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, );
+unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, );
+unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, );
+unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused,
+ );
 
 daemonize_complete();
 
@@ -8809,32 +8817,49 @@ main(int argc, char *argv[])
 
 /* Main loop. */
 exiting = false;
+paused = false;
 while (!exiting) {
-struct northd_context ctx = {
-.ovnnb_idl = ovnnb_idl_loop.idl,
-.ovnnb_txn = ovsdb_idl_loop_run(_idl_loop),
-.ovnsb_idl = ovnsb_idl_loop.idl,
-.ovnsb_txn = ovsdb_idl_loop_run(_idl_loop),
-.sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
-};
-
-if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-VLOG_INFO("ovn-northd lock acquired. "
-  "This ovn-northd instance is 

[ovs-dev] "soft freeze" now in effect for master branch

2019-07-08 Thread Ben Pfaff
Hi.  As described in Documentation/internals/release-process.rst, we are
in a "soft freeze" of the master branch:

   During the freeze, we ask committers to refrain from applying patches that
   add new features unless those patches were already being publicly discussed
   and reviewed before the freeze began.  Bug fixes are welcome at any time.
   Please propose and discuss exceptions on ovs-dev.

We're a week late entering into this (because I forgot).  We should
branch for version 2.12 a week from now, on July 15, but perhaps we'll
delay it by a week also due to the oversight.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/5] dpif-netdev: Trigger parallel pmd reloads.

2019-07-08 Thread Ilya Maximets
On 08.07.2019 17:55, Ilya Maximets wrote:
> On 04.07.2019 14:59, David Marchand wrote:
>> pmd reloads are currently serialised in each steps calling
>> reload_affected_pmds.
>> Any pmd processing packets, waiting on a mutex etc... will make other
>> pmd threads wait for a delay that can be undeterministic when syscalls
>> adds up.
>>
>> Switch to a little busy loop on the control thread using the existing
>> per-pmd reload boolean.
>>
>> The memory order on this atomic is rel-acq to have an explicit
>> synchronisation between the pmd threads and the control thread.
>>
>> Signed-off-by: David Marchand 
>> ---
>> Changelog since v2:
>> - remove unneeded synchronisation on pmd thread join (Ilya)
>> - "inlined" synchronisation loop in reload_affected_pmds
>>
>> Changelog since v1:
>> - removed the introduced reloading_pmds atomic and reuse the existing
>>   pmd->reload boolean as a single synchronisation point (Ilya)
>>
>> Changelog since RFC v1:
>> - added memory ordering on 'reloading_pmds' atomic to serve as a
>>   synchronisation point between pmd threads and control thread
>>
>> ---
>>  lib/dpif-netdev.c | 26 --
>>  1 file changed, 12 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 8eeec63..415107b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -649,9 +649,6 @@ struct dp_netdev_pmd_thread {
>>  struct ovs_refcount ref_cnt;/* Every reference must be refcount'ed. 
>> */
>>  struct cmap_node node;  /* In 'dp->poll_threads'. */
>>  
>> -pthread_cond_t cond;/* For synchronizing pmd thread reload. 
>> */
>> -struct ovs_mutex cond_mutex;/* Mutex for condition variable. */
>> -
>>  /* Per thread exact-match cache.  Note, the instance for cpu core
>>   * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
>>   * need to be protected by 'non_pmd_mutex'.  Every other instance
>> @@ -1758,11 +1755,8 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread 
>> *pmd)
>>  return;
>>  }
>>  
>> -ovs_mutex_lock(>cond_mutex);
>>  seq_change(pmd->reload_seq);
>>  atomic_store_explicit(>reload, true, memory_order_release);
>> -ovs_mutex_cond_wait(>cond, >cond_mutex);
>> -ovs_mutex_unlock(>cond_mutex);
>>  }
>>  
>>  static uint32_t
>> @@ -4655,6 +4649,17 @@ reload_affected_pmds(struct dp_netdev *dp)
>>  if (pmd->need_reload) {
>>  flow_mark_flush(pmd);
>>  dp_netdev_reload_pmd__(pmd);
>> +}
>> +}
>> +
>> +CMAP_FOR_EACH (pmd, node, >poll_threads) {
>> +if (pmd->need_reload) {
>> +bool reload;
>> +
>> +do {
>> +atomic_read_explicit(>reload, ,
>> + memory_order_acquire);
>> +} while (reload);
> 
> If we'll ever set 'need_reload' for non-PMD thread there will be
> kind of deadlock. We never waited for non-PMD thread previously,
> because dp_netdev_reload_pmd__() has a special case for it.

You may try to disable emc for internal port to reproduce.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 2/5] dpif-netdev: Trigger parallel pmd reloads.

2019-07-08 Thread Ilya Maximets
On 04.07.2019 14:59, David Marchand wrote:
> pmd reloads are currently serialised in each steps calling
> reload_affected_pmds.
> Any pmd processing packets, waiting on a mutex etc... will make other
> pmd threads wait for a delay that can be undeterministic when syscalls
> adds up.
> 
> Switch to a little busy loop on the control thread using the existing
> per-pmd reload boolean.
> 
> The memory order on this atomic is rel-acq to have an explicit
> synchronisation between the pmd threads and the control thread.
> 
> Signed-off-by: David Marchand 
> ---
> Changelog since v2:
> - remove unneeded synchronisation on pmd thread join (Ilya)
> - "inlined" synchronisation loop in reload_affected_pmds
> 
> Changelog since v1:
> - removed the introduced reloading_pmds atomic and reuse the existing
>   pmd->reload boolean as a single synchronisation point (Ilya)
> 
> Changelog since RFC v1:
> - added memory ordering on 'reloading_pmds' atomic to serve as a
>   synchronisation point between pmd threads and control thread
> 
> ---
>  lib/dpif-netdev.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 8eeec63..415107b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -649,9 +649,6 @@ struct dp_netdev_pmd_thread {
>  struct ovs_refcount ref_cnt;/* Every reference must be refcount'ed. 
> */
>  struct cmap_node node;  /* In 'dp->poll_threads'. */
>  
> -pthread_cond_t cond;/* For synchronizing pmd thread reload. 
> */
> -struct ovs_mutex cond_mutex;/* Mutex for condition variable. */
> -
>  /* Per thread exact-match cache.  Note, the instance for cpu core
>   * NON_PMD_CORE_ID can be accessed by multiple threads, and thusly
>   * need to be protected by 'non_pmd_mutex'.  Every other instance
> @@ -1758,11 +1755,8 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread 
> *pmd)
>  return;
>  }
>  
> -ovs_mutex_lock(>cond_mutex);
>  seq_change(pmd->reload_seq);
>  atomic_store_explicit(>reload, true, memory_order_release);
> -ovs_mutex_cond_wait(>cond, >cond_mutex);
> -ovs_mutex_unlock(>cond_mutex);
>  }
>  
>  static uint32_t
> @@ -4655,6 +4649,17 @@ reload_affected_pmds(struct dp_netdev *dp)
>  if (pmd->need_reload) {
>  flow_mark_flush(pmd);
>  dp_netdev_reload_pmd__(pmd);
> +}
> +}
> +
> +CMAP_FOR_EACH (pmd, node, >poll_threads) {
> +if (pmd->need_reload) {
> +bool reload;
> +
> +do {
> +atomic_read_explicit(>reload, ,
> + memory_order_acquire);
> +} while (reload);

If we'll ever set 'need_reload' for non-PMD thread there will be
kind of deadlock. We never waited for non-PMD thread previously,
because dp_netdev_reload_pmd__() has a special case for it.

>  pmd->need_reload = false;
>  }
>  }
> @@ -5842,11 +5847,8 @@ dpif_netdev_enable_upcall(struct dpif *dpif)
>  static void
>  dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd)
>  {
> -ovs_mutex_lock(>cond_mutex);
> -atomic_store_relaxed(>reload, false);
>  pmd->last_reload_seq = seq_read(pmd->reload_seq);
> -xpthread_cond_signal(>cond);
> -ovs_mutex_unlock(>cond_mutex);
> +atomic_store_explicit(>reload, false, memory_order_release);
>  }
>  
>  /* Finds and refs the dp_netdev_pmd_thread on core 'core_id'.  Returns
> @@ -5931,8 +5933,6 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
> *pmd, struct dp_netdev *dp,
>  pmd->reload_seq = seq_create();
>  pmd->last_reload_seq = seq_read(pmd->reload_seq);
>  atomic_init(>reload, false);
> -xpthread_cond_init(>cond, NULL);
> -ovs_mutex_init(>cond_mutex);
>  ovs_mutex_init(>flow_mutex);
>  ovs_mutex_init(>port_mutex);
>  cmap_init(>flow_table);
> @@ -5975,8 +5975,6 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
>  cmap_destroy(>flow_table);
>  ovs_mutex_destroy(>flow_mutex);
>  seq_destroy(pmd->reload_seq);
> -xpthread_cond_destroy(>cond);
> -ovs_mutex_destroy(>cond_mutex);
>  ovs_mutex_destroy(>port_mutex);
>  free(pmd);
>  }
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add a statistic on vhost tx contention.

2019-07-08 Thread Ilya Maximets
On 07.07.2019 19:13, David Marchand wrote:
> Add a statistic to help diagnose contention on the vhost txqs.
> This is usually seen as dropped packets on the physical ports for rates
> that are usually handled fine by OVS.
> 
> Signed-off-by: David Marchand 
> ---
> This patch applies on top of Kevin "vhost-retry" series [1].
> 
> 1: https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=70482
> ---
>  Documentation/topics/dpdk/vhost-user.rst | 25 +
>  lib/netdev-dpdk.c| 20 ++--
>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 724aa62..e276b21 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -553,6 +553,31 @@ shown with::
>  
>$ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
>  
> +vhost tx contention
> +---
> +
> +Contrary to physical ports where the number of Transmit queues is decided at
> +the configuration of the port and does not change once set up, the number of
> +Transmit queues for a vhost-user or vhost-user-client interface is negociated
> +with the guest when its driver enables a virtio interface and OVS has no
> +control over it.
> +
> +Each PMD threads needs a Transmit queue to send packet on such a port.
> +To accomodate with the cases where the number of PMD threads differs from the
> +number of Transmit queues enabled by a Virtual Machine, the Transmit queues 
> are
> +distributed among the PMD threads and their accesses are protected by a
> +spinlock.
> +
> +When contention occurs (because two PMD threads are mapped to the same 
> Transmit
> +queue), it can be hard to understand this phenomenon.
> +It usually will be seen as OVS taking more cycles to handle the packet and 
> can
> +be seen as packets getting dropped on the receive side (rx_dropped statistic
> +on the physical ports).
> +
> +A statistic for the port is available to catch those cases::
> +
> +  ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_contentions


I'm concerned about this statistics. 'tx_retries' had a slight connection
with the networking and the end user is able to understand what it means,
but this one is fully abstracted from the network and requires the deep
knowledge of the current netdev-dpdk implementation to understand what does
it mean.

I see 2 options for counters like this:

  1. coverage counter.
  2. PMD perf. stats.

The first one is the definitely correct place for such statistics, but we'll
loose some information like the port where this happened.

The second one seems suitable too, but it's hard to expose information to
the upper layer from netdev (see the awful vhost qfill). This is, however,
could be solved by re-implementing the dpif-netdev-perf with the 'coverage
counters' like approach, how I suggested previously while discussing the
original 'dpif-netdev-perf' patch-set.

So, for now, I'm voting for a simple coverage counter.


More thoughts:

Main idea: In OVS we already have too many ways of counting different events!

We have a 'stats_lock', please, don't introduce "atomic" way of statistics
collecting. If you wish to have atomics, please rewrite all of stats in
netdev-dpdk to atomics (not a good idea, probably).

Regarding implementation:
Why these stats are per-queue? As you're using atomic there is no need to
collect them separately. And you will not avoid cache invalidation, because
tx queues are not cacheline aligned/padded.

Since you're using atomic operations, the variable should be atomic_uint64_t.


Best regards, Ilya Maximets.

> +
>  vhost-user Dequeue Zero Copy (experimental)
>  ---
>  
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0e70e44..844fb3c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -139,10 +139,11 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / 
> ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF))
>  #define XSTAT_RX_JABBER_ERRORS   "rx_jabber_errors"
>  
>  /* Size of vHost custom stats. */
> -#define VHOST_CUSTOM_STATS_SIZE  1
> +#define VHOST_CUSTOM_STATS_SIZE  2
>  
>  /* Names of vHost custom stats. */
>  #define VHOST_STAT_TX_RETRIES"tx_retries"
> +#define VHOST_STAT_TX_CONTENTIONS"tx_contentions"
>  
>  #define SOCKET0  0
>  
> @@ -340,6 +341,8 @@ struct dpdk_tx_queue {
>  * pmd threads (see 'concurrent_txq'). */
>  int map;   /* Mapping of configured vhost-user queues
>  * to enabled by guest. */
> +uint64_t tx_contentions;   /* Number of times a pmd has been waiting
> +* for another to xmit on this queue. */
>  };
>  
>  /* dpdk has no way to remove dpdk ring ethernet devices
> @@ -2387,7 +2390,10 @@ 

Re: [ovs-dev] [PATCH] tnl-neigh-cache: Purge learnt neighbors when port/bridge is deleted

2019-07-08 Thread Flavio Leitner via dev
On Tue, Jul 02, 2019 at 08:50:16PM -0400, Vasu Dasari wrote:
> Hi Flavio,
> 
> I am trying to emulate the test case scenario you mentioned earlier, where
> in we need to clean you neighbor cache when an external interface goes
> down. To study that, I wrote a small script based off of test case
> system-traffic.at, "datapath - ping over vxlan tunnel". And this experiment
> gave me good experience around the understanding of netdev and kmod
> implementation of VxLAN.
> 
> Various tests I have performed using the new script include,
> 1. ovs-vswitchd started normally, for performing kernel mode VxLAN tests.
> And in this mode, tested following cases
>a. VxLAN interface being part of OVS.
>b. VxLAN interface external to OVS
>I see that in kernel mode of operation, native tunneling is off, and
> hence no ARP entries are learnt in tnl-neigh-cache. Please refer to
> ofproto-dpif-xlate.c:terminate_native_tunnel(). So, whatever changes we are
> making here(as part of my commit) are not effected in kernel mode of
> operation.

In either case my understanding is that OvS doesn't care about the
VXLAN and just hands off the packet to the interface. In another
words, it's the interface's job to encapsulate the traffic and
that's why it doesn't impact on the tnl-neigh-cache.


> 2. ovs-vswitchd started with "--disable-system" option for performing
> usermode VxLAN tests. And in this mode, tested following cases
>a. VxLAN interface being part of OVS. This test case works. In this
> case, entries are cleanly removed by user deleting the ports from the
> bridge.

Right, this is the scenario you tested first, if I recall correctly.

>b. VxLAN interface external to OVS. I could not get this case to work.

I think OvS will inject the packet to the VXLAN interface, but I
never tried this as it seems unusual scenario to have.

> 3. ovs-vswitchd started normally(without --disable-system option) for
> performing usermode VxLAN tests. And in this mode, tested following cases
>a. VxLAN interface being part of OVS. This test case works. In this
> case, entries are cleanly removed by user deleting the ports from the
> bridge.
>b. VxLAN interface external to OVS. I could not get this case to work.

This looks like a perfect copy from item #2 which is okay, just checking
if there was no copy error somewhere :-)

 
> I could not 2b and 3b use cases to at all. Do you expect these to work
> normally?  The reason I ask this is, as I understand from the code, even
> though "ovs/route/show" shows the route to remote vtep, OVS does not know
> which ODP port to take to send the packet out of. Please refer to
> ofproto-dpif-xlate.c:native_tunnel_output() and tnl_route_lookup_flow() and
> hence packet transmission fails with "native tunnel routing failed".
> 
> If you agree that 2b and 3b are not supported use cases, then I can submit
> my code changes as per your review comments.

Let me step back a bit because in the scenario I told initially had
the egress device (ethX) with the endpoint address outside of the
bridge while the VXLAN interface was always part of the bridge.
Therefore we had two scenarios to cover with the difference being
the endpoint address being part or not part of the OvS bridge. 

However, looking more closely to the code if the VXLAN is part of the
bridge in userspace, then native tunnel will need the tnl-neigh-cache
to build the headers.  Then it sends out the ARP Request packet to the
datapath only, so it doesn't seem to support endpoint addresses outside
of the OvS. I guess your initial patch covering only interfaces as part
of OvS was good enough then.

Do you agree with that?

Thanks!
fbl


> 
> Please let me know what you think of.
> -Vasu
> 
> *Vasu Dasari*
> 
> 
> On Mon, Jun 24, 2019 at 6:15 PM Flavio Leitner  wrote:
> 
> > On Mon, Jun 24, 2019 at 05:43:26PM -0400, Vasu Dasari wrote:
> > > On Mon, Jun 24, 2019 at 3:58 PM Flavio Leitner  wrote:
> > > > On Wed, Jun 19, 2019 at 11:02:07PM -0400, Vasu Dasari wrote:
> > > > > +{
> > > > > +struct tnl_neigh_entry *neigh;
> > > > > +bool changed = false;
> > > > > +
> > > > > +ovs_mutex_lock();
> > > > > +CMAP_FOR_EACH(neigh, cmap_node, ) {
> > > > > +if (nullable_string_is_equal(neigh->br_name, br_name)) {
> > > > > +tnl_neigh_delete(neigh);
> > > > > +changed = true;
> > > >
> > > > Do you expect to match on additional entries? Otherwise it
> > > > could bail out here.
> > > >
> > > >
> > > Say there are two or more neighbors on the port or on bridge, then by
> > > bailing out we would be missing others. So, will leave it there.
> >
> > You're right.
> >
> > [...]
> > > > However, as I mentioned in the discussion, the tnl IP address could
> > > > be on a device that doesn't belong to OVS at all. For example:
> > [...]
> > > > The tnl endpoint must have a valid route, so I suggest to hook the
> > > > tnl_neigh_cache_flush into route-table.c which receives a notification
> > > > when a route 

Re: [ovs-dev] [PATCH v2] Avoid dp_hash recirculation for balance-tcp bond selection mode

2019-07-08 Thread Ilya Maximets
On 25.06.2019 12:28, Vishal Deep Ajmera wrote:
> Problem:
> 
> In OVS-DPDK, flows with output over a bond interface of type "balance-tcp"
> (using a hash on TCP/UDP 5-tuple) get translated by the ofproto layer into
> "HASH" and "RECIRC" datapath actions. After recirculation, the packet is
> forwarded to the bond member port based on 8-bits of the datapath hash
> value computed through dp_hash. This causes performance degradation in the
> following ways:
> 
> 1. The recirculation of the packet implies another lookup of the packet's
> flow key in the exact match cache (EMC) and potentially Megaflow classifier
> (DPCLS). This is the biggest cost factor.
> 
> 2. The recirculated packets have a new "RSS" hash and compete with the
> original packets for the scarce number of EMC slots. This implies more
> EMC misses and potentially EMC thrashing causing costly DPCLS lookups.
> 
> 3. The 256 extra megaflow entries per bond for dp_hash bond selection put
> additional load on the revalidation threads.
> 
> Owing to this performance degradation, deployments stick to "balance-slb"
> bond mode even though it does not do active-active load balancing for
> VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same
> source MAC address.
> 
> Proposed optimization:
> --
> This proposal introduces a new load-balancing output action instead of
> recirculation.
> 
> Maintain one table per-bond (could just be an array of uint16's) and
> program it the same way internal flows are created today for each possible
> hash value(256 entries) from ofproto layer. Use this table to load-balance
> flows as part of output action processing.
> 
> Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules()
> -> bond_may_recirc() and compose_output_action__() generate
> "dp_hash(hash_l4(0))" and "recirc()" actions. In this case the
> RecircID identifies the bond. For the recirculated packets the ofproto layer
> installs megaflow entries that match on RecircID and masked dp_hash and send
> them to the corresponding output port.
> 
> Instead, we will now generate actions as
> "hash(l4(0)),lb_output(bond,)"
> 
> This combines hash computation (only if needed, else re-use RSS hash) and
> inline load-balancing over the bond. This action is used *only* for 
> balance-tcp
> bonds in OVS-DPDK datapath (the OVS kernel datapath remains unchanged).
> 
> Example:
> 
> Current scheme:
> ---
> With 1 IP-UDP flow:
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2828969, bytes:181054016, used:0.000s, 
> actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0x113683bd/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:2828937, bytes:181051968, used:0.000s, actions:2
> 
> With 8 IP-UDP flows (with random UDP src port): (all hitting same DPCL):
> 
> flow-dump from pmd on cpu core: 2
> recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2674009, bytes:171136576, used:0.000s, 
> actions:hash(hash_l4(0)),recirc(0x1)
> 
> recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:377395, bytes:24153280, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb236c260/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:333486, bytes:21343104, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0x7d89eb18/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:348461, bytes:22301504, used:0.000s, actions:1
> recirc_id(0x1),dp_hash(0xa78d75df/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:633353, bytes:40534592, used:0.000s, actions:2
> recirc_id(0x1),dp_hash(0xb58d846f/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:319901, bytes:20473664, used:0.001s, actions:2
> recirc_id(0x1),dp_hash(0x24534406/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:334985, bytes:21439040, used:0.001s, actions:1
> recirc_id(0x1),dp_hash(0x3cf32550/0xff),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
>  packets:326404, bytes:20889856, used:0.001s, actions:1
> 
> New scheme:
> ---
> We can do with a single flow entry (for any number of new flows):
> 
> in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:02:14:01,dst=0c:c4:7a:58:f0:2b),eth_type(0x0800),ipv4(frag=no),
>  packets:2674009, bytes:171136576, used:0.000s, 
> actions:hash(l4(0)),lb_output(bond,1)
> 
> A new CLI has been added to dump the per-PMD bond cache as given below.
> 
> "sudo ovs-appctl dpif-netdev/pmd-bond-show"
> 
> root@ubuntu-190:performance_scripts # sudo ovs-appctl 
> dpif-netdev/pmd-bond-show
> pmd thread numa_id 0 core_id 4:
> Bond cache:
> 

Re: [ovs-dev] [PATCH v4 0/3] vhost tx retry updates

2019-07-08 Thread Ian Stokes

On 7/2/2019 1:32 AM, Kevin Traynor wrote:

v4:
- 1/2 New patch: Move vhost tx retries doc to a seperate section (David)

- 2/3
-- Changed tx_retries to be a custom stat for vhost (Ilya)
-- Added back in MIN() that was dropped in v2, as in retesting I
saw it is needed when the retry limit is reached to prevent
an accounting error
-- note: this is not a typo, it was just out of date
-/* 44 pad bytes here. */
+/* 4 pad bytes here. */
-- Removed acks due to the amount of changes made



Thanks all for the work reviewing this. I've made the final amendments 
suggested and pushed to master.


Thanks
Ian
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Add a statistic on vhost tx contention.

2019-07-08 Thread David Marchand
On Mon, Jul 8, 2019 at 1:41 PM Eelco Chaudron  wrote:

>
>
> On 7 Jul 2019, at 18:13, David Marchand wrote:
>
> > Add a statistic to help diagnose contention on the vhost txqs.
> > This is usually seen as dropped packets on the physical ports for
> > rates
> > that are usually handled fine by OVS.
> >
> > Signed-off-by: David Marchand 
> > ---
> > This patch applies on top of Kevin "vhost-retry" series [1].
> >
> > 1:
> > https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=70482
> > ---
> >  Documentation/topics/dpdk/vhost-user.rst | 25
> > +
> >  lib/netdev-dpdk.c| 20 ++--
> >  2 files changed, 43 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/topics/dpdk/vhost-user.rst
> > b/Documentation/topics/dpdk/vhost-user.rst
> > index 724aa62..e276b21 100644
> > --- a/Documentation/topics/dpdk/vhost-user.rst
> > +++ b/Documentation/topics/dpdk/vhost-user.rst
> > @@ -553,6 +553,31 @@ shown with::
> >
> >$ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries
> >
> > +vhost tx contention
> > +---
> > +
> > +Contrary to physical ports where the number of Transmit queues is
> > decided at
> > +the configuration of the port and does not change once set up, the
> > number of
> > +Transmit queues for a vhost-user or vhost-user-client interface is
> > negociated
> > +with the guest when its driver enables a virtio interface and OVS has
> > no
> > +control over it.
> > +
> > +Each PMD threads needs a Transmit queue to send packet on such a
> > port.
> > +To accomodate with the cases where the number of PMD threads differs
> > from the
>
> Maybe make it more explicit, i.e. where the number of PMD threads is
> higher than the number of tx queues.
>

Ok, I will try to re-phrase this in v2.



> > +number of Transmit queues enabled by a Virtual Machine, the Transmit
> > queues are
> > +distributed among the PMD threads and their accesses are protected by
> > a
> > +spinlock.
> > +
> > +When contention occurs (because two PMD threads are mapped to the
> > same Transmit
> > +queue), it can be hard to understand this phenomenon.
> > +It usually will be seen as OVS taking more cycles to handle the
> > packet and can
> > +be seen as packets getting dropped on the receive side (rx_dropped
> > statistic
> > +on the physical ports).
> > +
> > +A statistic for the port is available to catch those cases::
> > +
> > +  ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_contentions
> > +
> >  vhost-user Dequeue Zero Copy (experimental)
> >  ---
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 0e70e44..844fb3c 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -139,10 +139,11 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
> > ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF))
> >  #define XSTAT_RX_JABBER_ERRORS   "rx_jabber_errors"
> >
> >  /* Size of vHost custom stats. */
> > -#define VHOST_CUSTOM_STATS_SIZE  1
> > +#define VHOST_CUSTOM_STATS_SIZE  2
> >
> >  /* Names of vHost custom stats. */
> >  #define VHOST_STAT_TX_RETRIES"tx_retries"
> > +#define VHOST_STAT_TX_CONTENTIONS"tx_contentions"
> >
> >  #define SOCKET0  0
> >
> > @@ -340,6 +341,8 @@ struct dpdk_tx_queue {
> >  * pmd threads (see
> > 'concurrent_txq'). */
> >  int map;   /* Mapping of configured
> > vhost-user queues
> >  * to enabled by guest. */
> > +uint64_t tx_contentions;   /* Number of times a pmd has been
> > waiting
> > +* for another to xmit on this
> > queue. */
> >  };
> >
> >  /* dpdk has no way to remove dpdk ring ethernet devices
> > @@ -2387,7 +2390,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
> > int qid,
> >  goto out;
> >  }
> >
> > -rte_spinlock_lock(>tx_q[qid].tx_lock);
> > +if (unlikely(!rte_spinlock_trylock(>tx_q[qid].tx_lock))) {
> > +rte_spinlock_lock(>tx_q[qid].tx_lock);
> > +atomic_count_inc64(>tx_q[qid].tx_contentions);
>
> I would swap the two lines above, as the atomic will probably finish
> before the lock is free’ed.
>

Indeed, better.


> > +}
> >
> >  cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
> >  /* Check has QoS has been configured for the netdev */
> > @@ -2878,6 +2884,16 @@ netdev_dpdk_vhost_get_custom_stats(const struct
> > netdev *netdev,
> >  custom_stats->counters[0].value = dev->tx_retries;
> >  rte_spinlock_unlock(>stats_lock);
> >
> > +ovs_strlcpy(custom_stats->counters[1].name,
> > VHOST_STAT_TX_CONTENTIONS,
>
> Can we maybe do something smart with the index, [1], used below in each
> case (and above [0]), as it seems prone to error.
>

I had in mind a little index variable that is incremented each time a
statistic is filled in this function.
This could be done in 

Re: [ovs-dev] [PATCH] netdev-dpdk: Add a statistic on vhost tx contention.

2019-07-08 Thread Eelco Chaudron



On 7 Jul 2019, at 18:13, David Marchand wrote:


Add a statistic to help diagnose contention on the vhost txqs.
This is usually seen as dropped packets on the physical ports for 
rates

that are usually handled fine by OVS.

Signed-off-by: David Marchand 
---
This patch applies on top of Kevin "vhost-retry" series [1].

1: 
https://patchwork.ozlabs.org/project/openvswitch/list/?submitter=70482

---
 Documentation/topics/dpdk/vhost-user.rst | 25 
+

 lib/netdev-dpdk.c| 20 ++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst

index 724aa62..e276b21 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -553,6 +553,31 @@ shown with::

   $ ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_retries

+vhost tx contention
+---
+
+Contrary to physical ports where the number of Transmit queues is 
decided at
+the configuration of the port and does not change once set up, the 
number of
+Transmit queues for a vhost-user or vhost-user-client interface is 
negociated
+with the guest when its driver enables a virtio interface and OVS has 
no

+control over it.
+
+Each PMD threads needs a Transmit queue to send packet on such a 
port.
+To accomodate with the cases where the number of PMD threads differs 
from the


Maybe make it more explicit, i.e. where the number of PMD threads is 
higher than the number of tx queues.


+number of Transmit queues enabled by a Virtual Machine, the Transmit 
queues are
+distributed among the PMD threads and their accesses are protected by 
a

+spinlock.
+
+When contention occurs (because two PMD threads are mapped to the 
same Transmit

+queue), it can be hard to understand this phenomenon.
+It usually will be seen as OVS taking more cycles to handle the 
packet and can
+be seen as packets getting dropped on the receive side (rx_dropped 
statistic

+on the physical ports).
+
+A statistic for the port is available to catch those cases::
+
+  ovs-vsctl get Interface dpdkvhostclient0 statistics:tx_contentions
+
 vhost-user Dequeue Zero Copy (experimental)
 ---

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0e70e44..844fb3c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -139,10 +139,11 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / 
ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF))

 #define XSTAT_RX_JABBER_ERRORS   "rx_jabber_errors"

 /* Size of vHost custom stats. */
-#define VHOST_CUSTOM_STATS_SIZE  1
+#define VHOST_CUSTOM_STATS_SIZE  2

 /* Names of vHost custom stats. */
 #define VHOST_STAT_TX_RETRIES"tx_retries"
+#define VHOST_STAT_TX_CONTENTIONS"tx_contentions"

 #define SOCKET0  0

@@ -340,6 +341,8 @@ struct dpdk_tx_queue {
 * pmd threads (see 
'concurrent_txq'). */
 int map;   /* Mapping of configured 
vhost-user queues

 * to enabled by guest. */
+uint64_t tx_contentions;   /* Number of times a pmd has been 
waiting
+* for another to xmit on this 
queue. */

 };

 /* dpdk has no way to remove dpdk ring ethernet devices
@@ -2387,7 +2390,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, 
int qid,

 goto out;
 }

-rte_spinlock_lock(>tx_q[qid].tx_lock);
+if (unlikely(!rte_spinlock_trylock(>tx_q[qid].tx_lock))) {
+rte_spinlock_lock(>tx_q[qid].tx_lock);
+atomic_count_inc64(>tx_q[qid].tx_contentions);


I would swap the two lines above, as the atomic will probably finish 
before the lock is free’ed.



+}

 cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
 /* Check has QoS has been configured for the netdev */
@@ -2878,6 +2884,16 @@ netdev_dpdk_vhost_get_custom_stats(const struct 
netdev *netdev,

 custom_stats->counters[0].value = dev->tx_retries;
 rte_spinlock_unlock(>stats_lock);

+ovs_strlcpy(custom_stats->counters[1].name, 
VHOST_STAT_TX_CONTENTIONS,


Can we maybe do something smart with the index, [1], used below in each 
case (and above [0]), as it seems prone to error.



+NETDEV_CUSTOM_STATS_NAME_SIZE);
+custom_stats->counters[1].value = 0;
+for (unsigned int i = 0; i < netdev->n_txq; i++) {
+uint64_t tx_contentions;
+
+atomic_read_relaxed(>tx_q[i].tx_contentions, 
_contentions);

+custom_stats->counters[1].value += tx_contentions;
+}
+
 ovs_mutex_unlock(>mutex);

 return 0;
--
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/3] ovn-controller: Fix and refactor chassis ovn-sbdb record init

2019-07-08 Thread Dumitru Ceara
On Fri, Jul 5, 2019 at 11:46 PM Ben Pfaff  wrote:
>
> On Wed, Jun 12, 2019 at 12:37:57PM +0200, Dumitru Ceara wrote:
> > The chassis_run code didn't take into account the scenario when the
> > system-id was changed in the Open_vSwitch table. Due to this the code
> > was trying to insert a new Chassis record in the OVN_Southbound DB with
> > the same Encaps as the previous Chassis record. The transaction used
> > to insert the new records was aborting due to the ["type", "ip"]
> > index constraint violation as we were creating new Encap entries with
> > the same "type" and "ip" as the old ones.
>
> It looks like this series is still needed but that it no longer
> applies.  I am sorry about that.  Would please rebase it?

Hi Ben,

I sent a rebased v2:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=118303

Thanks,
Dumitru
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/3] ovn-controller: Update stale chassis entry at init

2019-07-08 Thread Dumitru Ceara
The first time ovn-controller initializes the Chassis entry (shortly
after start up) we first look if there is a stale Chassis record in the
OVN_Southbound DB by checking if any of the old Encap entries associated
to the Chassis record match the new tunnel configuration. If found it
means that ovn-controller didn't shutdown gracefully last time it was
run so it didn't cleanup the Chassis table. Potentially in the meantime
the OVS system-id was also changed. We then update the stale entry with
the new configuration and store the last configured chassis-id in memory
to avoid walking the Chassis table every time.

Signed-off-by: Dumitru Ceara 
---
 ovn/controller/chassis.c|   49 +++
 ovn/controller/chassis.h|1 +
 ovn/controller/ovn-controller.c |6 +++--
 3 files changed, 49 insertions(+), 7 deletions(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index 5d4ecbd..04b98d8 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -436,17 +436,49 @@ chassis_build_encaps(struct ovsdb_idl_txn *ovnsb_idl_txn,
 return encaps;
 }
 
+/*
+ * Returns a pointer to a chassis record from 'chassis_table' that
+ * matches at least one tunnel config.
+ */
+static const struct sbrec_chassis *
+chassis_get_stale_record(const struct sbrec_chassis_table *chassis_table,
+ const struct ovs_chassis_cfg *ovs_cfg,
+ const char *chassis_id)
+{
+const struct sbrec_chassis *chassis_rec;
+
+SBREC_CHASSIS_TABLE_FOR_EACH (chassis_rec, chassis_table) {
+for (size_t i = 0; i < chassis_rec->n_encaps; i++) {
+if (sset_contains(_cfg->encap_type_set,
+  chassis_rec->encaps[i]->type) &&
+sset_contains(_cfg->encap_ip_set,
+  chassis_rec->encaps[i]->ip)) {
+return chassis_rec;
+}
+if (strcmp(chassis_rec->name, chassis_id) == 0) {
+return chassis_rec;
+}
+}
+}
+
+return NULL;
+}
+
 /* If this is a chassis config update after we initialized the record once
  * then we should always be able to find it with the ID we saved in
  * chassis_state.
- * Otherwise (i.e., first time we create the record) we create a new record.
+ * Otherwise (i.e., first time we create the record) then we check if there's
+ * a stale record from a previous controller run that didn't end gracefully
+ * and reuse it. If not then we create a new record.
  */
 static const struct sbrec_chassis *
 chassis_get_record(struct ovsdb_idl_txn *ovnsb_idl_txn,
struct ovsdb_idl_index *sbrec_chassis_by_name,
+   const struct sbrec_chassis_table *chassis_table,
+   const struct ovs_chassis_cfg *ovs_cfg,
const char *chassis_id)
 {
-const struct sbrec_chassis *chassis_rec = NULL;
+const struct sbrec_chassis *chassis_rec;
 
 if (chassis_info_id_inited(_state)) {
 chassis_rec = chassis_lookup_by_name(sbrec_chassis_by_name,
@@ -455,8 +487,13 @@ chassis_get_record(struct ovsdb_idl_txn *ovnsb_idl_txn,
 VLOG_WARN("Could not find Chassis : stored (%s) ovs (%s)",
   chassis_info_id(_state), chassis_id);
 }
-} else if (ovnsb_idl_txn) {
-chassis_rec = sbrec_chassis_insert(ovnsb_idl_txn);
+} else {
+chassis_rec =
+chassis_get_stale_record(chassis_table, ovs_cfg, chassis_id);
+
+if (!chassis_rec && ovnsb_idl_txn) {
+chassis_rec = sbrec_chassis_insert(ovnsb_idl_txn);
+}
 }
 return chassis_rec;
 }
@@ -523,6 +560,7 @@ const struct sbrec_chassis *
 chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 struct ovsdb_idl_index *sbrec_chassis_by_name,
 const struct ovsrec_open_vswitch_table *ovs_table,
+const struct sbrec_chassis_table *chassis_table,
 const char *chassis_id,
 const struct ovsrec_bridge *br_int,
 const struct sset *transport_zones)
@@ -536,7 +574,8 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 }
 
 const struct sbrec_chassis *chassis_rec =
-chassis_get_record(ovnsb_idl_txn, sbrec_chassis_by_name, chassis_id);
+chassis_get_record(ovnsb_idl_txn, sbrec_chassis_by_name,
+   chassis_table, _cfg, chassis_id);
 
 /* If we found (or created) a record, update it with the correct config
  * and store the current chassis_id for fast lookup in case it gets
diff --git a/ovn/controller/chassis.h b/ovn/controller/chassis.h
index 1366683..16a131a 100644
--- a/ovn/controller/chassis.h
+++ b/ovn/controller/chassis.h
@@ -33,6 +33,7 @@ const struct sbrec_chassis *chassis_run(
 struct ovsdb_idl_txn *ovnsb_idl_txn,
 struct ovsdb_idl_index *sbrec_chassis_by_name,
 const struct ovsrec_open_vswitch_table *,
+const struct sbrec_chassis_table *,
  

[ovs-dev] [PATCH v2 2/3] ovn-controller: Refactor chassis.c to abstract the string parsing

2019-07-08 Thread Dumitru Ceara
Abstract out the chassis config string processing and use library data
structures (e.g., sset).
Rename the get_chassis_id function in ovn-controller.c to
get_ovs_chassis_id to avoid confusion with the newly added
chassis_get_id function from chassis.c which returns the last
successfully configured chassis-id.

Signed-off-by: Dumitru Ceara 
---
 ovn/controller/chassis.c|  583 +--
 ovn/controller/ovn-controller.c |   18 +
 2 files changed, 390 insertions(+), 211 deletions(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index 8099f30..5d4ecbd 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -74,6 +74,42 @@ chassis_info_id(const struct chassis_info *info)
 return ds_cstr_ro(>id);
 }
 
+/*
+ * Structure for storing the chassis config parsed from the ovs table.
+ */
+struct ovs_chassis_cfg {
+/* Single string fields parsed from external-ids. */
+const char *hostname;
+const char *bridge_mappings;
+const char *datapath_type;
+const char *encap_csum;
+const char *cms_options;
+const char *chassis_macs;
+
+/* Set of encap types parsed from the 'ovn-encap-type' external-id. */
+struct sset encap_type_set;
+/* Set of encap IPs parsed from the 'ovn-encap-type' external-id. */
+struct sset encap_ip_set;
+/* Interface type list formatted in the OVN-SB Chassis required format. */
+struct ds iface_types;
+};
+
+static void
+ovs_chassis_cfg_init(struct ovs_chassis_cfg *cfg)
+{
+sset_init(>encap_type_set);
+sset_init(>encap_ip_set);
+ds_init(>iface_types);
+}
+
+static void
+ovs_chassis_cfg_destroy(struct ovs_chassis_cfg *cfg)
+{
+sset_destroy(>encap_type_set);
+sset_destroy(>encap_ip_set);
+ds_destroy(>iface_types);
+}
+
 void
 chassis_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 {
@@ -85,20 +121,21 @@ chassis_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 }
 
 static const char *
-pop_tunnel_name(uint32_t *type)
+get_hostname(const struct smap *ext_ids)
 {
-if (*type & GENEVE) {
-*type &= ~GENEVE;
-return "geneve";
-} else if (*type & STT) {
-*type &= ~STT;
-return "stt";
-} else if (*type & VXLAN) {
-*type &= ~VXLAN;
-return "vxlan";
-}
-
-OVS_NOT_REACHED();
+const char *hostname = smap_get_def(ext_ids, "hostname", "");
+
+if (strlen(hostname) == 0) {
+static char hostname_[HOST_NAME_MAX + 1];
+
+if (gethostname(hostname_, sizeof(hostname_))) {
+hostname_[0] = 0;
+}
+
+return _[0];
+}
+
+return hostname;
 }
 
 static const char *
@@ -119,6 +156,22 @@ get_cms_options(const struct smap *ext_ids)
 return smap_get_def(ext_ids, "ovn-cms-options", "");
 }
 
+static const char *
+get_encap_csum(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-encap-csum", "true");
+}
+
+static const char *
+get_datapath_type(const struct ovsrec_bridge *br_int)
+{
+if (br_int && br_int->datapath_type) {
+return br_int->datapath_type;
+}
+
+return "";
+}
+
 static void
 update_chassis_transport_zones(const struct sset *transport_zones,
const struct sbrec_chassis *chassis_rec)
@@ -139,240 +192,366 @@ update_chassis_transport_zones(const struct sset 
*transport_zones,
 sset_destroy(_tzones_set);
 }
 
-/* Returns this chassis's Chassis record, if it is available. */
-const struct sbrec_chassis *
-chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
-struct ovsdb_idl_index *sbrec_chassis_by_name,
-const struct ovsrec_open_vswitch_table *ovs_table,
-const char *chassis_id,
-const struct ovsrec_bridge *br_int,
-const struct sset *transport_zones)
+/*
+ * Parse an ovs 'encap_type' string and stores the resulting types in the
+ * 'encap_type_set' string set.
+ */
+static bool
+chassis_parse_ovs_encap_type(const char *encap_type,
+ struct sset *encap_type_set)
 {
-const struct ovsrec_open_vswitch *cfg;
-const char *encap_type, *encap_ip;
+sset_from_delimited_string(encap_type_set, encap_type, ",");
 
-const struct sbrec_chassis *chassis_rec = NULL;
+const char *type;
 
-if (chassis_info_id_inited(_state)) {
-chassis_rec = chassis_lookup_by_name(sbrec_chassis_by_name,
- chassis_info_id(_state));
+SSET_FOR_EACH (type, encap_type_set) {
+if (!get_tunnel_type(type)) {
+VLOG_INFO("Unknown tunnel type: %s", type);
+}
 }
 
-if (!ovnsb_idl_txn) {
-return chassis_rec;
+return true;
+}
+
+/*
+ * Parse an ovs 'encap_ip' string and stores the resulting IP representations
+ * in the 'encap_ip_set' string set.
+ */
+static bool
+chassis_parse_ovs_encap_ip(const char *encap_ip, struct sset *encap_ip_set)
+{
+sset_from_delimited_string(encap_ip_set, encap_ip, ",");
+return true;
+}
+

[ovs-dev] [PATCH v2 1/3] ovn-controller: Fix chassis ovn-sbdb record init

2019-07-08 Thread Dumitru Ceara
The chassis_run code didn't take into account the scenario when the
system-id was changed in the Open_vSwitch table. Due to this the code
was trying to insert a new Chassis record in the OVN_Southbound DB with
the same Encaps as the previous Chassis record. The transaction used
to insert the new records was aborting due to the ["type", "ip"]
index constraint violation as we were creating new Encap entries with
the same "type" and "ip" as the old ones.

In order to fix this issue the flow is now:
1. the first time ovn-controller initializes the Chassis (shortly after
start up) we store the chassis-id.
2. for subsequent chassis_run calls we use last configured
chassis-id stored at the previous step to lookup the old Chassis record.
3. when ovn-controller shuts down gracefully we lookup the Chassis
record based on the chassis-id stored in memory at steps 1 and 2 above.
This is to avoid failing to cleanup the Chassis record in OVN_Southbound
DB if the OVS system-id changes between the last call to chassis_run and
chassis_cleanup.

Reported-at: https://bugzilla.redhat.com/1708146
Reported-by: Haidong Li 
Signed-off-by: Dumitru Ceara 
---
 ovn/controller/chassis.c|   82 +--
 ovn/controller/chassis.h|1 
 ovn/controller/ovn-controller.c |2 -
 tests/ovn-controller.at |9 
 4 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/ovn/controller/chassis.c b/ovn/controller/chassis.c
index d0277ae..8099f30 100644
--- a/ovn/controller/chassis.c
+++ b/ovn/controller/chassis.c
@@ -36,6 +36,44 @@ VLOG_DEFINE_THIS_MODULE(chassis);
 #define HOST_NAME_MAX 255
 #endif /* HOST_NAME_MAX */
 
+/*
+ * Structure to hold chassis specific state (currently just chassis-id)
+ * to avoid database lookups when changes happen while the controller is
+ * running.
+ */
+struct chassis_info {
+/* Last ID we initialized the Chassis SB record with. */
+struct ds id;
+
+/* True if Chassis SB record is initialized, false otherwise. */
+uint32_t id_inited : 1;
+};
+
+static struct chassis_info chassis_state = {
+.id = DS_EMPTY_INITIALIZER,
+.id_inited = false,
+};
+
+static void
+chassis_info_set_id(struct chassis_info *info, const char *id)
+{
+ds_clear(>id);
+ds_put_cstr(>id, id);
+info->id_inited = true;
+}
+
+static bool
+chassis_info_id_inited(const struct chassis_info *info)
+{
+return info->id_inited;
+}
+
+static const char *
+chassis_info_id(const struct chassis_info *info)
+{
+return ds_cstr_ro(>id);
+}
+
 void
 chassis_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 {
@@ -110,16 +148,20 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 const struct ovsrec_bridge *br_int,
 const struct sset *transport_zones)
 {
-const struct sbrec_chassis *chassis_rec
-= chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id);
+const struct ovsrec_open_vswitch *cfg;
+const char *encap_type, *encap_ip;
+
+const struct sbrec_chassis *chassis_rec = NULL;
+
+if (chassis_info_id_inited(_state)) {
+chassis_rec = chassis_lookup_by_name(sbrec_chassis_by_name,
+ chassis_info_id(_state));
+}
+
 if (!ovnsb_idl_txn) {
 return chassis_rec;
 }
 
-const struct ovsrec_open_vswitch *cfg;
-const char *encap_type, *encap_ip;
-static bool inited = false;
-
 cfg = ovsrec_open_vswitch_table_first(ovs_table);
 if (!cfg) {
 VLOG_INFO("No Open_vSwitch row defined.");
@@ -263,10 +305,8 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 
 if (same) {
 /* Nothing changed. */
-inited = true;
-ds_destroy(_types);
-return chassis_rec;
-} else if (!inited) {
+goto inited;
+} else if (!chassis_info_id_inited(_state)) {
 struct ds cur_encaps = DS_EMPTY_INITIALIZER;
 for (int i = 0; i < chassis_rec->n_encaps; i++) {
 ds_put_format(_encaps, "%s,",
@@ -293,7 +333,6 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 smap_add(_ids, "datapath-type", datapath_type);
 smap_add(_ids, "iface-types", iface_types_str);
 chassis_rec = sbrec_chassis_insert(ovnsb_idl_txn);
-sbrec_chassis_set_name(chassis_rec, chassis_id);
 sbrec_chassis_set_hostname(chassis_rec, hostname);
 sbrec_chassis_set_external_ids(chassis_rec, _ids);
 smap_destroy(_ids);
@@ -327,7 +366,13 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
 free(tokstr);
 free(encaps);
 
-inited = true;
+inited:
+/* Store the name of the chassis for further lookups. */
+if (!chassis_rec->name || strcmp(chassis_id, chassis_rec->name)) {
+sbrec_chassis_set_name(chassis_rec, chassis_id);
+chassis_info_set_id(_state, chassis_id);
+}
+
 return chassis_rec;
 }
 
@@ -393,3 +438,16 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
 }
 

[ovs-dev] [PATCH v2 0/3] ovn-controller: Fix and refactor chassis ovn-sbdb record init

2019-07-08 Thread Dumitru Ceara
The chassis_run code didn't take into account the scenario when the
system-id was changed in the Open_vSwitch table. Due to this the code
was trying to insert a new Chassis record in the OVN_Southbound DB with
the same Encaps as the previous Chassis record. The transaction used
to insert the new records was aborting due to the ["type", "ip"]
index constraint violation as we were creating new Encap entries with
the same "type" and "ip" as the old ones.

This series:
1. introduces a (partial) fix for the reported issue by storing in
ovn-controller memory the last successfully configured chassis-id.
2. refactors the code in chassis.c to abstract out the string
processing and make available the ovs configuration to the code.
that looks up stale chassis entries in the OVN_Southbound DB
3. completes the fix at point 1 above by taking care of the
scenario when ovn-controller restarts after stopping forcefully
(e.g., due to a crash) and the OVS system-id has changed in the
meantime.

Dumitru Ceara (3):
  ovn-controller: Fix chassis ovn-sbdb record init
  ovn-controller: Refactor chassis.c to abstract the string parsing
  ovn-controller: Update stale chassis entry at init


 ovn/controller/chassis.c|  676 +++
 ovn/controller/chassis.h|2 
 ovn/controller/ovn-controller.c |   26 +-
 tests/ovn-controller.at |9 +
 4 files changed, 501 insertions(+), 212 deletions(-)


---
v2: rebase
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/5] Quicker pmd threads reloads

2019-07-08 Thread Eelco Chaudron

Reviewed the patchset, and the changes look fine to me.

Ack for the series…

Acked-by: Eelco Chaudron 


On 4 Jul 2019, at 13:59, David Marchand wrote:


We have been testing the rebalance code in different situations while
having traffic going through OVS.
Those tests have shown that part of the observed packets losses is due 
to
some time wasted in signaling/waiting for the pmd threads to reload 
their

polling configurations.

This series is an attempt at getting pmd threads reloads quicker and
more deterministic.

Example of number of cycles spent by a pmd between two polling
configurations (in cycles minimum/average/maximum of 1000 changes):
- d65586b6fa97: 125332/288153/717400
- patch1:   125860/267639/793113
- patch2:44720/182703/464042
- patch3:13968/159362/487000
- patch4:13621/161201/500806
- patch5:12422/ 18788/ 45818

Changelog since v2:
- remove unneeded synchronisation on pmd thread join in patch 2

Changelog since v1:
- incorporated Ilya suggestions in patch 2 and 3
- dropped previous acks on patch 2 and 3 but kept them on patch 4 and 
5 since

  there is no major change in them

Changelog since RFC v2:
- added ack from Eelco

Changelog since RFC v1:
- added numbers per patch in cover letter
- added memory ordering for explicit synchronisations between threads
  in patch 1 and patch 2

--
David Marchand

David Marchand (5):
  dpif-netdev: Convert exit latch to flag.
  dpif-netdev: Trigger parallel pmd reloads.
  dpif-netdev: Do not sleep when swapping queues.
  dpif-netdev: Only reload static tx qid when needed.
  dpif-netdev: Catch reloads faster.

 lib/dpif-netdev.c | 107 
+-

 1 file changed, 73 insertions(+), 34 deletions(-)

--
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev