Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Tue, Jul 9, 2019 at 11:05 AM Han Zhou wrote: > > > On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > > > > > > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique > wrote: > > > > > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > > >> > > >> > > >> > > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > > >> > > > >> > Thanks a lot Han for the answer! > > >> > > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara > wrote: > > >> > > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > > >> > > > wrote: > > >> > > > > > > >> > > > > Hi Han, all, > > >> > > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > > >> > > > > using OVN and wanted to present some results and issues that > we've > > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > > >> > > > > is the scenario that we executed: > > >> > > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute > nodes. OVS > > >> > > > > 2.10. > > >> > > > > * The test consists on: > > >> > > > > - Create openstack network (OVN LS), subnet and router > > >> > > > > - Attach subnet to the router and set gw to the external > network > > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs > to allow > > >> > > > > UDP, SSH and ICMP). > > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > > >> > > > > attaching it to a network namespace. > > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' > in NB) > > >> > > > > - Wait until the test can ping the port > > >> > > > > * Running browbeat/rally with 16 simultaneous process to > execute the > > >> > > > > test above 150 times. > > >> > > > > * When all the 150 'fake VMs' are created, browbeat will > delete all > > >> > > > > the OpenStack/OVN resources. > > >> > > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results > which showed > > >> > > > > 100% success but ovn-controller is quite loaded (as expected) > in all > > >> > > > > the nodes especially during the deletion phase: > > >> > > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > > >> > > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller > in all 7 > > >> > > > > nodes by the one with the current master branch (actually > from last > > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The > expected > > >> > > > > results were to get less ovn-controller CPU usage and also > better > > >> > > > > times due to the Incremental Processing feature introduced > recently. > > >> > > > > However, the results don't look very good: > > >> > > > > > > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/99kiyDp > > >> > > > > > > >> > > > > One thing that we can tell from the ovs-vswitchd CPU > consumption is > > >> > > > > that it's much less in the Incremental Processing (IP) case > which > > >> > > > > apparently doesn't make much sense. This led us to think that > perhaps > > >> > > > > ovn-controller was not installing the necessary flows in the > switch > > >> > > > > and we confirmed this hypothesis by looking into the dataplane > > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via > ping > > >> > > > > when using ovn-controller from master. > > >> > > > > > > >> > > > > @Han, others, do you have any ideas as of what could be > happening > > >> > > > > here? We'll be able to use this setup for a few more days so > let me > > >> > > > > know if you want us to pull some other data/traces, ... > > >> > > > > > > >> > > > > Some other interesting things: > > >> > > > > On each of the compute nodes, (with an almost evenly > distributed > > >> > > > > number of logical ports bound to them), the max amount of > logical > > >> > > > > flows in br-int is ~90K (by the end of the test, right before > deleting > > >> > > > > the resources). > > >> > > > > > > >> > > > > It looks like with the IP version, ovn-controller leaks some > memory: > > >> > > > > https://imgur.com/a/trQrhWd > > >> > > > > While with OVS 2.10, it remains pretty flat during the test: > > >> > > > > https://imgur.com/a/KCkIT4O > > >> > > > > > >> > > > Hi Daniel, Han, > > >> > > > > > >> > > > I just sent a small patch for the ovn-controller memory leak: > > >> > > > https://patchwork.ozlabs.org/patch/1113758/ > > >> > > > > > >> > > > At least on my setup this is what valgrind was pointing at. > > >> > > > > > >> > > > Cheers, >
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
On Fri, Jun 21, 2019 at 12:31 AM Han Zhou wrote: > > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique wrote: > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < dalva...@redhat.com> wrote: > >> > > >> > Thanks a lot Han for the answer! > >> > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara wrote: > >> > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> > > > wrote: > >> > > > > > >> > > > > Hi Han, all, > >> > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of OpenStack > >> > > > > using OVN and wanted to present some results and issues that we've > >> > > > > found with the Incremental Processing feature in ovn-controller. Below > >> > > > > is the scenario that we executed: > >> > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute nodes. OVS > >> > > > > 2.10. > >> > > > > * The test consists on: > >> > > > > - Create openstack network (OVN LS), subnet and router > >> > > > > - Attach subnet to the router and set gw to the external network > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs to allow > >> > > > > UDP, SSH and ICMP). > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > >> > > > > attaching it to a network namespace. > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in NB) > >> > > > > - Wait until the test can ping the port > >> > > > > * Running browbeat/rally with 16 simultaneous process to execute the > >> > > > > test above 150 times. > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete all > >> > > > > the OpenStack/OVN resources. > >> > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which showed > >> > > > > 100% success but ovn-controller is quite loaded (as expected) in all > >> > > > > the nodes especially during the deletion phase: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/8ffKKYF > >> > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller in all 7 > >> > > > > nodes by the one with the current master branch (actually from last > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The expected > >> > > > > results were to get less ovn-controller CPU usage and also better > >> > > > > times due to the Incremental Processing feature introduced recently. > >> > > > > However, the results don't look very good: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/wuq87F1 > >> > > > > - Controller node (ovn-northd and ovsdb-servers): https://imgur.com/a/99kiyDp > >> > > > > > >> > > > > One thing that we can tell from the ovs-vswitchd CPU consumption is > >> > > > > that it's much less in the Incremental Processing (IP) case which > >> > > > > apparently doesn't make much sense. This led us to think that perhaps > >> > > > > ovn-controller was not installing the necessary flows in the switch > >> > > > > and we confirmed this hypothesis by looking into the dataplane > >> > > > > results. Out of the 150 VMs, 10% of them were unreachable via ping > >> > > > > when using ovn-controller from master. > >> > > > > > >> > > > > @Han, others, do you have any ideas as of what could be happening > >> > > > > here? We'll be able to use this setup for a few more days so let me > >> > > > > know if you want us to pull some other data/traces, ... > >> > > > > > >> > > > > Some other interesting things: > >> > > > > On each of the compute nodes, (with an almost evenly distributed > >> > > > > number of logical ports bound to them), the max amount of logical > >> > > > > flows in br-int is ~90K (by the end of the test, right before deleting > >> > > > > the resources). > >> > > > > > >> > > > > It looks like with the IP version, ovn-controller leaks some memory: > >> > > > > https://imgur.com/a/trQrhWd > >> > > > > While with OVS 2.10, it remains pretty flat during the test: > >> > > > > https://imgur.com/a/KCkIT4O > >> > > > > >> > > > Hi Daniel, Han, > >> > > > > >> > > > I just sent a small patch for the ovn-controller memory leak: > >> > > > https://patchwork.ozlabs.org/patch/1113758/ > >> > > > > >> > > > At least on my setup this is what valgrind was pointing at. > >> > > > > >> > > > Cheers, > >> > > > Dumitru > >> > > > > >> > > > > > >> > > > > Looking forward to hearing back :) > >> > > > > Daniel > >> > > > > > >> > > > > PS. Sorry for my previous email, I sent it by mistake without the subject > >> > > > > ___ > >> > > > > discuss mailing list > >> > > > >
Re: [ovs-discuss] [OVN] Aging mechanism for MAC_Binding table
On Mon, Jul 08, 2019 at 06:19:23PM -0700, Han Zhou wrote: > On Thu, Jun 27, 2019 at 6:44 AM Ben Pfaff wrote: > > > > On Tue, Jun 25, 2019 at 01:05:21PM +0200, Daniel Alvarez Sanchez wrote: > > > Lately we've been trying to solve certain issues related to stale > > > entries in the MAC_Binding table (e.g. [0]). On the other hand, for > > > the OpenStack + Octavia (Load Balancing service) use case, we see that > > > a reused VIP can be as well affected by stale entries in this table > > > due to the fact that it's never bound to a VIF so ovn-controller won't > > > claim it and send the GARPs to update the neighbors. > > > > > > I'm not sure if other scenarios may suffer from this issue but seems > > > reasonable to have an aging mechanism (as we discussed at some point > > > in the past) that makes unused/old entries to expire. After talking to > > > Numan on IRC, since a new pinctrl thread has been introduced recently > > > [1], it'd be nice to implement this aging mechanism there. > > > At the same time we'd be also reducing the amount of entries for long > > > lived systems as it'd grow indefinitely. > > > > > > Any thoughts? > > > > > > Thanks! > > > Daniel > > > > > > PS. With regards to the 'unused' vs 'old' entries I think it has to be > > > 'old' rather than 'unused' as I don't see a way to reset the TTL of a > > > MAC_Binding entry when we see packets coming. The implication is that > > > we'll be seeing ARPs sent out more often when perhaps they're not > > > needed. This also leads to the discussion of making the cache timeout > > > configurable. > > > > I've always considered the MAC_Binding implementation incomplete because > > of this issue and others. ovn/TODO.rst says: > > > > * Dynamic IP to MAC binding enhancements. > > > > OVN has basic support for establishing IP to MAC bindings > dynamically, using > > ARP. > > > > * Ratelimiting. > > > > From casual observation, Linux appears to generate at most one > ARP per > > second per destination. > > > > This might be supported by adding a new OVN logical action for > > rate-limiting. > > > > * Tracking queries > > > > It's probably best to only record in the database responses to > queries > > actually issued by an L3 logical router, so somehow they have to > be > > tracked, probably by putting a tentative binding without a MAC > address > > into the database. > > > > * Renewal and expiration. > > > > Something needs to make sure that bindings remain valid and > expire those > > that become stale. > > > > One way to do this might be to add some support for time to the > database > > server itself. > > > > * Table size limiting. > > > > The table of MAC bindings must not be allowed to grow > unreasonably large. > > > > * MTU handling (fragmentation on output) > > > > So, what do we do about it? First, I think that adding support for time > > to the database server is a terrible idea (even though I think I wrote > > the above originally). Let's not do that. The following is some > > "thinking out loud" on the subject. > > > > I think there's a challenge around which ovn-controller should take care > > of a given MAC_Binding. We don't want every ovn-controller expiring > > every binding. Ideally, we want exactly one ovn-controller expiring a > > binding. One way would be to add an owner column (but it would be > > better if we don't need it). > > > > If we want to keep track of "unused" bindings, I can imagine a > > statistical mechanism to do that. Any user of a binding occasionally > > and probabilistically changes a serial number column that we'd introduce > > into the MAC_Binding table (this could be optimized to not bother if it > > has changed recently). The owner checks the serial number every so > > often and if it hasn't changed then it deletes the row. > > > > Thanks Ben for the advice. Since the user of a binding is simply a OpenFlow > rule matching, I guess we will need "controller" action to trigger the > serial number column update in ovn-controller, combined with a meter action > so that only small number of packets trigger the update. Is this what you > are suggesting? I had not thought that far ahead! That approach would work, although the trigger percentage would be difficult to figure out--it seems like really we'd want "every Nth second", not "every Nth packet". Another approach that might work would be for ovn-controller to notice the statistics on appropriate OpenFlow flows changing, or to use "learn" actions as a way to make a controller action trigger only every so often. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] Re:[HELP] Question about userspace geneve/vxlan port
Hi Ben: I have read the "userspace tunnel" document for several times, but I still have no clue how could the tunnel pkt get parsed in rx direction. In my first mail, I have found the "tnl_port_receive" should only be called during upcall process, but for userspace upcall process ,after miniflow_extract the metadata field has been set to "0", so the tunnel header can't be parsed. My test env is as below, there are two OVS userspace bridges, if dpdk0 on br-provider receive a pkt with tunnel header, the pkt would be delivered to internal port br-provider, but can't be sent to OVN-XXX port on br-int. I am wondering if my test topology is wrong or there are other mechanism to parse tunnel hdr. Thank you ! Timo Bridge br-int fail_mode: secure Port "ovn-e1c6a3-0" Interface "ovn-e1c6a3-0" type: geneve options: {csum="true", key=flow, remote_ip="10.142.18.12"} Port "vhuf77e9f1f-d9" Interface "vhuf77e9f1f-d9" type: dpdkvhostuser Port br-int Interface br-int type: internal Bridge br-provider Port br-provider Interface br-provider type: internal Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs=":02:00.0", n_rxq="2"} -- Ben Pfaff txfh2007 ovs-discuss Re: Re:[ovs-discuss] Re:[HELP] Question about userspace geneve/vxlan port Native tunneling and userspace tunneling are the same thing. The mechanism should be symmetric: configuration for sending packets out should also work for parsing them on the way back in. On Mon, Jul 08, 2019 at 03:57:46PM +0800, txfh2007 wrote: > Hi Ben: > Thanks for your reply ! I didn't find the "native tunneling" document in > OpenvSwitch repository. Did you mean the document "userspace-tunneling.rst". > this document just tells us the br-phy can send tunnel pkt out, but when dpdk > type port receives pkts with tunnel hdr, how could I configure the "native > tunnel" mechanism to parse and handle these pkts? Or what you mean is > currently OVS cannot handle parsing tunnel pkts in userspace ? > > Thank you > > Timo > > > -- > Ben Pfaff > txfh2007 > ovs-discuss > Re: [ovs-discuss] Re:[HELP] Question about userspace geneve/vxlan port > > > On Thu, Jul 04, 2019 at 05:27:28PM +0800, txfh2007 via discuss wrote: > > I have found theoritically during the upcall process, task > > tnl_port_receive could be called(via upcall_cb() -> upcall_receive() > > -> xlate_lookup() ->xport_lookup). But in my env, after tracing code > > by gdb, I have found the task "tnl_port_should_receive(flow)" always > > returns "false" for flow->tunnel->ip_dst is "0", even if the pkt > > received by dpdk port has a tunnel header. > > Yes. > > > I guess the reason is in userspace task "handle_packet_upcall", the > > match.tun_md.valid has been set "false", so the expanded flow has no > > tunnel info, and also in task "miniflow_extract" in flow.c, the > > packet->md is null as in dfc_processing task the "md_is_valid" flag is > > always "false". Am I right ? > > Yes. > > OVS takes what some might consider an idiosyncratic approach to tunnel > processing. The "obvious" approach is to simply parse tunnel headers > and throw those into the flow. If OVS did that, then you'd see what you > expect, but this isn't what OVS does. > > Instead, OVS treats tunnel and their headers as metadata. This is > because of OVS's history as part of the Linux kernel. The Linux kernel > has tunnel implementations as part of the TCP/IP stack. When a tunnel > packet arrives at a physical port in Linux, it passes into the TCP/IP > stack, where it gets processed and received on a tunnel network device. > This effectively strips the tunnel headers and transforms them into > metadata. If the tunnel network device is part of an OVS bridge, then > it gets the packet at that point, and treats the metadata as something > that can be matched. > > With other datapaths, OVS expects some equivalent mechanism to exist. > For the userspace datapath, OVS implements "native tunneling" to provide > that mechanism. It's described in the OVS documentation. > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] Aging mechanism for MAC_Binding table
On Thu, Jun 27, 2019 at 6:44 AM Ben Pfaff wrote: > > On Tue, Jun 25, 2019 at 01:05:21PM +0200, Daniel Alvarez Sanchez wrote: > > Lately we've been trying to solve certain issues related to stale > > entries in the MAC_Binding table (e.g. [0]). On the other hand, for > > the OpenStack + Octavia (Load Balancing service) use case, we see that > > a reused VIP can be as well affected by stale entries in this table > > due to the fact that it's never bound to a VIF so ovn-controller won't > > claim it and send the GARPs to update the neighbors. > > > > I'm not sure if other scenarios may suffer from this issue but seems > > reasonable to have an aging mechanism (as we discussed at some point > > in the past) that makes unused/old entries to expire. After talking to > > Numan on IRC, since a new pinctrl thread has been introduced recently > > [1], it'd be nice to implement this aging mechanism there. > > At the same time we'd be also reducing the amount of entries for long > > lived systems as it'd grow indefinitely. > > > > Any thoughts? > > > > Thanks! > > Daniel > > > > PS. With regards to the 'unused' vs 'old' entries I think it has to be > > 'old' rather than 'unused' as I don't see a way to reset the TTL of a > > MAC_Binding entry when we see packets coming. The implication is that > > we'll be seeing ARPs sent out more often when perhaps they're not > > needed. This also leads to the discussion of making the cache timeout > > configurable. > > I've always considered the MAC_Binding implementation incomplete because > of this issue and others. ovn/TODO.rst says: > > * Dynamic IP to MAC binding enhancements. > > OVN has basic support for establishing IP to MAC bindings dynamically, using > ARP. > > * Ratelimiting. > > From casual observation, Linux appears to generate at most one ARP per > second per destination. > > This might be supported by adding a new OVN logical action for > rate-limiting. > > * Tracking queries > > It's probably best to only record in the database responses to queries > actually issued by an L3 logical router, so somehow they have to be > tracked, probably by putting a tentative binding without a MAC address > into the database. > > * Renewal and expiration. > > Something needs to make sure that bindings remain valid and expire those > that become stale. > > One way to do this might be to add some support for time to the database > server itself. > > * Table size limiting. > > The table of MAC bindings must not be allowed to grow unreasonably large. > > * MTU handling (fragmentation on output) > > So, what do we do about it? First, I think that adding support for time > to the database server is a terrible idea (even though I think I wrote > the above originally). Let's not do that. The following is some > "thinking out loud" on the subject. > > I think there's a challenge around which ovn-controller should take care > of a given MAC_Binding. We don't want every ovn-controller expiring > every binding. Ideally, we want exactly one ovn-controller expiring a > binding. One way would be to add an owner column (but it would be > better if we don't need it). > > If we want to keep track of "unused" bindings, I can imagine a > statistical mechanism to do that. Any user of a binding occasionally > and probabilistically changes a serial number column that we'd introduce > into the MAC_Binding table (this could be optimized to not bother if it > has changed recently). The owner checks the serial number every so > often and if it hasn't changed then it deletes the row. > Thanks Ben for the advice. Since the user of a binding is simply a OpenFlow rule matching, I guess we will need "controller" action to trigger the serial number column update in ovn-controller, combined with a meter action so that only small number of packets trigger the update. Is this what you are suggesting? > The owner could also occasionally revalidate the binding. > > Any thoughts? > > Thanks, > > Ben. > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
On Mon, Jul 8, 2019 at 5:43 PM Ben Pfaff wrote: > > Would you mind formally submitting this? It seems like the best > immediate solution. Will do, thanks a lot Ben! > > On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote: > > I tried a simple patch and it fixes the issue (see below). The > > question now is, do we want to do this? I think it makes sense to drop > > *all* the connections when the role changes but I'm curious to see > > what other people think: > > > > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c > > index 4dda63a..ddbbc2e 100644 > > --- a/ovsdb/jsonrpc-server.c > > +++ b/ovsdb/jsonrpc-server.c > > @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct > > ovsdb_jsonrpc_server *svr, > > { > > if (svr->read_only != read_only) { > > svr->read_only = read_only; > > -ovsdb_jsonrpc_server_reconnect(svr, false, > > +ovsdb_jsonrpc_server_reconnect(svr, true, > > xstrdup(read_only > > ? "making server read-only" > > : "making server > > read/write")); > > > > > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > $ovn-nbctl ls-add sw0 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: active > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > tcp:192.0.2.2:6641 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: backup > > connecting: tcp:192.0.2.2:6641 > > $ ovn-nbctl ls-add sw1 > > ovn-nbctl: transaction error: {"details":"insert operation not allowed > > when database server is in read only mode","error":"not allowed"} > > > > On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez > > wrote: > > > > > > I *think* that it may not a bug in ovsdb-server but a problem with > > > ovn-controller as it doesn't seem to be a DB change aware client. > > > > > > When the role changes from master to backup or viceversa, connections > > > are expected to be reestablished for all clients except those that are > > > not aware of db changes [0] (note the 'false' argument). This flag is > > > explained here [1] and looks like since ovn-controller is not > > > monitoring the Database table in the _Server database, then the > > > connection with it is not re-established. This is just a blind guess > > > but I can give it a shot :) > > > > > > [0] > > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 > > > [1] > > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 > > > > > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique > > > wrote: > > > > > > > > > > > > > > > > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez > > > > wrote: > > > >> > > > >> Hi folks, > > > >> > > > >> While working with an OpenStack environment running OVN and > > > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that > > > >> has been probably around for a long time. The bug itself seems to be > > > >> related with ovsdb-server not updating the read-only flag properly. > > > >> > > > >> With a 3 nodes cluster running ovsdb-server in active/passive mode, > > > >> when we restart the master-node, pacemaker promotes another node as > > > >> master and moves the associated IPAddr2 resource to it. > > > >> At this point, ovn-controller instances across the cloud reconnect to > > > >> the new node but there's a window where ovsdb-server is still running > > > >> as backup. > > > >> > > > >> For those ovn-controller instances that reconnect within that window, > > > >> every attempt to write in the OVSDB will fail with "operation not > > > >> allowed when database server is in read only mode". This state will > > > >> remain forever unless a reconnection is forced. Restarting > > > >> ovn-controller or killing the connection (for example with tcpkill) > > > >> will make things work again. > > > >> > > > >> A workaround in OVN OCF script could be to wait for the > > > >> ovsdb_server_promote function to wait until we get 'running/active' on > > > >> that instance. > > > >> > > > >> Another open question is what should clients (in this case, > > > >> ovn-controller) do in such situation? Shall they log an error and > > > >> attempt a reconnection (rate limited)? > > > > > > > > > > > > Thanks for reporting this issue Daniel. > > > > > > > > I can easily reproduce the issue with the below commands. > > > > > > > > $ > > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > > > $ovn-nbctl ls-add sw0 > > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > > > state: active > > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > > > tcp:192.0.2.2:6641 > > > > $ovs-appctl -t
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
Would you mind formally submitting this? It seems like the best immediate solution. On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote: > I tried a simple patch and it fixes the issue (see below). The > question now is, do we want to do this? I think it makes sense to drop > *all* the connections when the role changes but I'm curious to see > what other people think: > > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c > index 4dda63a..ddbbc2e 100644 > --- a/ovsdb/jsonrpc-server.c > +++ b/ovsdb/jsonrpc-server.c > @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct > ovsdb_jsonrpc_server *svr, > { > if (svr->read_only != read_only) { > svr->read_only = read_only; > -ovsdb_jsonrpc_server_reconnect(svr, false, > +ovsdb_jsonrpc_server_reconnect(svr, true, > xstrdup(read_only > ? "making server read-only" > : "making server > read/write")); > > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > $ovn-nbctl ls-add sw0 > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > state: active > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > tcp:192.0.2.2:6641 > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > state: backup > connecting: tcp:192.0.2.2:6641 > $ ovn-nbctl ls-add sw1 > ovn-nbctl: transaction error: {"details":"insert operation not allowed > when database server is in read only mode","error":"not allowed"} > > On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez > wrote: > > > > I *think* that it may not a bug in ovsdb-server but a problem with > > ovn-controller as it doesn't seem to be a DB change aware client. > > > > When the role changes from master to backup or viceversa, connections > > are expected to be reestablished for all clients except those that are > > not aware of db changes [0] (note the 'false' argument). This flag is > > explained here [1] and looks like since ovn-controller is not > > monitoring the Database table in the _Server database, then the > > connection with it is not re-established. This is just a blind guess > > but I can give it a shot :) > > > > [0] > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 > > [1] > > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 > > > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique wrote: > > > > > > > > > > > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez > > > wrote: > > >> > > >> Hi folks, > > >> > > >> While working with an OpenStack environment running OVN and > > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that > > >> has been probably around for a long time. The bug itself seems to be > > >> related with ovsdb-server not updating the read-only flag properly. > > >> > > >> With a 3 nodes cluster running ovsdb-server in active/passive mode, > > >> when we restart the master-node, pacemaker promotes another node as > > >> master and moves the associated IPAddr2 resource to it. > > >> At this point, ovn-controller instances across the cloud reconnect to > > >> the new node but there's a window where ovsdb-server is still running > > >> as backup. > > >> > > >> For those ovn-controller instances that reconnect within that window, > > >> every attempt to write in the OVSDB will fail with "operation not > > >> allowed when database server is in read only mode". This state will > > >> remain forever unless a reconnection is forced. Restarting > > >> ovn-controller or killing the connection (for example with tcpkill) > > >> will make things work again. > > >> > > >> A workaround in OVN OCF script could be to wait for the > > >> ovsdb_server_promote function to wait until we get 'running/active' on > > >> that instance. > > >> > > >> Another open question is what should clients (in this case, > > >> ovn-controller) do in such situation? Shall they log an error and > > >> attempt a reconnection (rate limited)? > > > > > > > > > Thanks for reporting this issue Daniel. > > > > > > I can easily reproduce the issue with the below commands. > > > > > > $ > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > > $ovn-nbctl ls-add sw0 > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > > state: active > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > > tcp:192.0.2.2:6641 > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > > state: backup > > > connecting: tcp:192.0.2.2:6641 > > > $ovn-nbctl ls-add sw1 --> This should have failed. Since OVN_NB_DAEMON > > > is set, ovn-nbctl talks to the > > >
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
ovn-controller is in fact change-aware, but the _Server database doesn't report whether a particular database is read-only or read/write. I guess that was an oversight when I designed that schema. That means that there's no way for clients to monitor whether a particular database changes between read-only and read/write. I guess there are two ways to fix it: 1. Add a read/write column to the _Server schema and implement it in ovsdb-server and ovn-controller. 2. Make ovsdb-server kill connections when read/write status changes. #2 is probably what we should do right away. #1 can wait. On Mon, Jul 08, 2019 at 01:25:09PM +0200, Daniel Alvarez Sanchez wrote: > I *think* that it may not a bug in ovsdb-server but a problem with > ovn-controller as it doesn't seem to be a DB change aware client. > > When the role changes from master to backup or viceversa, connections > are expected to be reestablished for all clients except those that are > not aware of db changes [0] (note the 'false' argument). This flag is > explained here [1] and looks like since ovn-controller is not > monitoring the Database table in the _Server database, then the > connection with it is not re-established. This is just a blind guess > but I can give it a shot :) > > [0] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 > [1] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique wrote: > > > > > > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez > > wrote: > >> > >> Hi folks, > >> > >> While working with an OpenStack environment running OVN and > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that > >> has been probably around for a long time. The bug itself seems to be > >> related with ovsdb-server not updating the read-only flag properly. > >> > >> With a 3 nodes cluster running ovsdb-server in active/passive mode, > >> when we restart the master-node, pacemaker promotes another node as > >> master and moves the associated IPAddr2 resource to it. > >> At this point, ovn-controller instances across the cloud reconnect to > >> the new node but there's a window where ovsdb-server is still running > >> as backup. > >> > >> For those ovn-controller instances that reconnect within that window, > >> every attempt to write in the OVSDB will fail with "operation not > >> allowed when database server is in read only mode". This state will > >> remain forever unless a reconnection is forced. Restarting > >> ovn-controller or killing the connection (for example with tcpkill) > >> will make things work again. > >> > >> A workaround in OVN OCF script could be to wait for the > >> ovsdb_server_promote function to wait until we get 'running/active' on > >> that instance. > >> > >> Another open question is what should clients (in this case, > >> ovn-controller) do in such situation? Shall they log an error and > >> attempt a reconnection (rate limited)? > > > > > > Thanks for reporting this issue Daniel. > > > > I can easily reproduce the issue with the below commands. > > > > $ > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > $ovn-nbctl ls-add sw0 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: active > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > tcp:192.0.2.2:6641 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: backup > > connecting: tcp:192.0.2.2:6641 > > $ovn-nbctl ls-add sw1 --> This should have failed. Since OVN_NB_DAEMON is > > set, ovn-nbctl talks to the > >ovn-nbctl daemon and it is able > > to create a logical switch even though the db is in backup mode > > $unset OVN_NB_DAEMON > > $ovn-nbctl ls-add sw2 > > ovn-nbctl: transaction error: {"details":"insert operation not allowed when > > database server is in read only mode","error":"not allowed"} > > > > > > I looked into the ovsdb-server code, when the user changes the state of the > > ovsdb-server, the read_only param of active ovsdb_server_sessions > > are not updated. > > > > Thanks > > Numan > > > >> > >> Thoughts? > >> > >> Thanks a lot, > >> Daniel > >> ___ > >> discuss mailing list > >> disc...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Re: Re:[HELP] Question about userspace geneve/vxlan port
Native tunneling and userspace tunneling are the same thing. The mechanism should be symmetric: configuration for sending packets out should also work for parsing them on the way back in. On Mon, Jul 08, 2019 at 03:57:46PM +0800, txfh2007 wrote: > Hi Ben: > Thanks for your reply ! I didn't find the "native tunneling" document in > OpenvSwitch repository. Did you mean the document "userspace-tunneling.rst". > this document just tells us the br-phy can send tunnel pkt out, but when dpdk > type port receives pkts with tunnel hdr, how could I configure the "native > tunnel" mechanism to parse and handle these pkts? Or what you mean is > currently OVS cannot handle parsing tunnel pkts in userspace ? > > Thank you > > Timo > > > -- > Ben Pfaff > txfh2007 > ovs-discuss > Re: [ovs-discuss] Re:[HELP] Question about userspace geneve/vxlan port > > > On Thu, Jul 04, 2019 at 05:27:28PM +0800, txfh2007 via discuss wrote: > > I have found theoritically during the upcall process, task > > tnl_port_receive could be called(via upcall_cb() -> upcall_receive() > > -> xlate_lookup() ->xport_lookup). But in my env, after tracing code > > by gdb, I have found the task "tnl_port_should_receive(flow)" always > > returns "false" for flow->tunnel->ip_dst is "0", even if the pkt > > received by dpdk port has a tunnel header. > > Yes. > > > I guess the reason is in userspace task "handle_packet_upcall", the > > match.tun_md.valid has been set "false", so the expanded flow has no > > tunnel info, and also in task "miniflow_extract" in flow.c, the > > packet->md is null as in dfc_processing task the "md_is_valid" flag is > > always "false". Am I right ? > > Yes. > > OVS takes what some might consider an idiosyncratic approach to tunnel > processing. The "obvious" approach is to simply parse tunnel headers > and throw those into the flow. If OVS did that, then you'd see what you > expect, but this isn't what OVS does. > > Instead, OVS treats tunnel and their headers as metadata. This is > because of OVS's history as part of the Linux kernel. The Linux kernel > has tunnel implementations as part of the TCP/IP stack. When a tunnel > packet arrives at a physical port in Linux, it passes into the TCP/IP > stack, where it gets processed and received on a tunnel network device. > This effectively strips the tunnel headers and transforms them into > metadata. If the tunnel network device is part of an OVS bridge, then > it gets the packet at that point, and treats the metadata as something > that can be matched. > > With other datapaths, OVS expects some equivalent mechanism to exist. > For the userspace datapath, OVS implements "native tunneling" to provide > that mechanism. It's described in the OVS documentation. > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
I tried a simple patch and it fixes the issue (see below). The question now is, do we want to do this? I think it makes sense to drop *all* the connections when the role changes but I'm curious to see what other people think: diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 4dda63a..ddbbc2e 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct ovsdb_jsonrpc_server *svr, { if (svr->read_only != read_only) { svr->read_only = read_only; -ovsdb_jsonrpc_server_reconnect(svr, false, +ovsdb_jsonrpc_server_reconnect(svr, true, xstrdup(read_only ? "making server read-only" : "making server read/write")); $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) $ovn-nbctl ls-add sw0 $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status state: active $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server tcp:192.0.2.2:6641 $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status state: backup connecting: tcp:192.0.2.2:6641 $ ovn-nbctl ls-add sw1 ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez wrote: > > I *think* that it may not a bug in ovsdb-server but a problem with > ovn-controller as it doesn't seem to be a DB change aware client. > > When the role changes from master to backup or viceversa, connections > are expected to be reestablished for all clients except those that are > not aware of db changes [0] (note the 'false' argument). This flag is > explained here [1] and looks like since ovn-controller is not > monitoring the Database table in the _Server database, then the > connection with it is not re-established. This is just a blind guess > but I can give it a shot :) > > [0] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 > [1] > https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique wrote: > > > > > > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez > > wrote: > >> > >> Hi folks, > >> > >> While working with an OpenStack environment running OVN and > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that > >> has been probably around for a long time. The bug itself seems to be > >> related with ovsdb-server not updating the read-only flag properly. > >> > >> With a 3 nodes cluster running ovsdb-server in active/passive mode, > >> when we restart the master-node, pacemaker promotes another node as > >> master and moves the associated IPAddr2 resource to it. > >> At this point, ovn-controller instances across the cloud reconnect to > >> the new node but there's a window where ovsdb-server is still running > >> as backup. > >> > >> For those ovn-controller instances that reconnect within that window, > >> every attempt to write in the OVSDB will fail with "operation not > >> allowed when database server is in read only mode". This state will > >> remain forever unless a reconnection is forced. Restarting > >> ovn-controller or killing the connection (for example with tcpkill) > >> will make things work again. > >> > >> A workaround in OVN OCF script could be to wait for the > >> ovsdb_server_promote function to wait until we get 'running/active' on > >> that instance. > >> > >> Another open question is what should clients (in this case, > >> ovn-controller) do in such situation? Shall they log an error and > >> attempt a reconnection (rate limited)? > > > > > > Thanks for reporting this issue Daniel. > > > > I can easily reproduce the issue with the below commands. > > > > $ > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > > $ovn-nbctl ls-add sw0 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: active > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > > tcp:192.0.2.2:6641 > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > > state: backup > > connecting: tcp:192.0.2.2:6641 > > $ovn-nbctl ls-add sw1 --> This should have failed. Since OVN_NB_DAEMON is > > set, ovn-nbctl talks to the > >ovn-nbctl daemon and it is able > > to create a logical switch even though the db is in backup mode > > $unset OVN_NB_DAEMON > > $ovn-nbctl ls-add sw2 > > ovn-nbctl: transaction error: {"details":"insert operation not allowed when > > database server is in read only mode","error":"not allowed"} > > > > > > I looked into the ovsdb-server code, when
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
I *think* that it may not a bug in ovsdb-server but a problem with ovn-controller as it doesn't seem to be a DB change aware client. When the role changes from master to backup or viceversa, connections are expected to be reestablished for all clients except those that are not aware of db changes [0] (note the 'false' argument). This flag is explained here [1] and looks like since ovn-controller is not monitoring the Database table in the _Server database, then the connection with it is not re-established. This is just a blind guess but I can give it a shot :) [0] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368 [1] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456 On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique wrote: > > > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez > wrote: >> >> Hi folks, >> >> While working with an OpenStack environment running OVN and >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that >> has been probably around for a long time. The bug itself seems to be >> related with ovsdb-server not updating the read-only flag properly. >> >> With a 3 nodes cluster running ovsdb-server in active/passive mode, >> when we restart the master-node, pacemaker promotes another node as >> master and moves the associated IPAddr2 resource to it. >> At this point, ovn-controller instances across the cloud reconnect to >> the new node but there's a window where ovsdb-server is still running >> as backup. >> >> For those ovn-controller instances that reconnect within that window, >> every attempt to write in the OVSDB will fail with "operation not >> allowed when database server is in read only mode". This state will >> remain forever unless a reconnection is forced. Restarting >> ovn-controller or killing the connection (for example with tcpkill) >> will make things work again. >> >> A workaround in OVN OCF script could be to wait for the >> ovsdb_server_promote function to wait until we get 'running/active' on >> that instance. >> >> Another open question is what should clients (in this case, >> ovn-controller) do in such situation? Shall they log an error and >> attempt a reconnection (rate limited)? > > > Thanks for reporting this issue Daniel. > > I can easily reproduce the issue with the below commands. > > $ $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach) > $ovn-nbctl ls-add sw0 > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > state: active > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server > tcp:192.0.2.2:6641 > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status > state: backup > connecting: tcp:192.0.2.2:6641 > $ovn-nbctl ls-add sw1 --> This should have failed. Since OVN_NB_DAEMON is > set, ovn-nbctl talks to the >ovn-nbctl daemon and it is able to > create a logical switch even though the db is in backup mode > $unset OVN_NB_DAEMON > $ovn-nbctl ls-add sw2 > ovn-nbctl: transaction error: {"details":"insert operation not allowed when > database server is in read only mode","error":"not allowed"} > > > I looked into the ovsdb-server code, when the user changes the state of the > ovsdb-server, the read_only param of active ovsdb_server_sessions > are not updated. > > Thanks > Numan > >> >> Thoughts? >> >> Thanks a lot, >> Daniel >> ___ >> discuss mailing list >> disc...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
Hi, Thanks for reporting, Daniel. On Mon, Jul 8, 2019 at 11:22 AM Daniel Alvarez Sanchez wrote: > > Hi folks, > > While working with an OpenStack environment running OVN and > ovsdb-server in A/P configuration with Pacemaker we hit an issue that > has been probably around for a long time. The bug itself seems to be > related with ovsdb-server not updating the read-only flag properly. > > With a 3 nodes cluster running ovsdb-server in active/passive mode, > when we restart the master-node, pacemaker promotes another node as > master and moves the associated IPAddr2 resource to it. > At this point, ovn-controller instances across the cloud reconnect to > the new node but there's a window where ovsdb-server is still running > as backup. > > For those ovn-controller instances that reconnect within that window, > every attempt to write in the OVSDB will fail with "operation not > allowed when database server is in read only mode". This state will > remain forever unless a reconnection is forced. Restarting > ovn-controller or killing the connection (for example with tcpkill) > will make things work again. > > A workaround in OVN OCF script could be to wait for the > ovsdb_server_promote function to wait until we get 'running/active' on > that instance. > > Another open question is what should clients (in this case, > ovn-controller) do in such situation? Shall they log an error and > attempt a reconnection (rate limited)? > I would say so, ovn-controller _requires_ a read-write session for it to function properly. Either it can retry to reconnect forever as you suggested or assert and exit if it's a read-only connection or a combination of the two (retry first and then exit). Also, we need to improve the logs for such errors. While debugging the problem it wasn't "easy" to find why ovn-controller wasn't updating the database (we were looking into the nb_cfg column of the Chassis table in the Southbound OVSDB). We've checked the state of the connection (it was stable), the process was healthy, etc... Was only when we enabled the DBG log level for ovn-controller that we've started seeing messages such as: 2019-07-04T15:11:19.522Z|00148|jsonrpc|DBG|tcp:172.17.1.27:6642: received notification, method="update2", params=[["monid","OVN_Southbound"],{"Chassis":{"cb669c72-0f84-412c-a3b f-482119649d85":{"modify":{"nb_cfg":3300] 2019-07-04T15:11:19.522Z|00149|jsonrpc|DBG|tcp:172.17.1.27:6642: received reply, result=[{"details":"update operation not allowed when database server is in read only mode","er ror":"not allowed"}], id=8062 So, perhaps logging it as ERROR would be better because without the DBG level all we could see in the logs was two INFO messages saying that it reconnected to the Southbound OVSDB. Cheers, Lucas ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez wrote: > Hi folks, > > While working with an OpenStack environment running OVN and > ovsdb-server in A/P configuration with Pacemaker we hit an issue that > has been probably around for a long time. The bug itself seems to be > related with ovsdb-server not updating the read-only flag properly. > > With a 3 nodes cluster running ovsdb-server in active/passive mode, > when we restart the master-node, pacemaker promotes another node as > master and moves the associated IPAddr2 resource to it. > At this point, ovn-controller instances across the cloud reconnect to > the new node but there's a window where ovsdb-server is still running > as backup. > > For those ovn-controller instances that reconnect within that window, > every attempt to write in the OVSDB will fail with "operation not > allowed when database server is in read only mode". This state will > remain forever unless a reconnection is forced. Restarting > ovn-controller or killing the connection (for example with tcpkill) > will make things work again. > > A workaround in OVN OCF script could be to wait for the > ovsdb_server_promote function to wait until we get 'running/active' on > that instance. > > Another open question is what should clients (in this case, > ovn-controller) do in such situation? Shall they log an error and > attempt a reconnection (rate limited)? > Thanks for reporting this issue Daniel. I can easily reproduce the issue with the below commands. $ This should have failed. Since OVN_NB_DAEMON is set, ovn-nbctl talks to the ovn-nbctl daemon and it is able to create a logical switch even though the db is in backup mode $unset OVN_NB_DAEMON $ovn-nbctl ls-add sw2 ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} I looked into the ovsdb-server code, when the user changes the state of the ovsdb-server, the read_only param of active ovsdb_server_sessions are not updated. Thanks Numan > Thoughts? > > Thanks a lot, > Daniel > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker
Hi folks, While working with an OpenStack environment running OVN and ovsdb-server in A/P configuration with Pacemaker we hit an issue that has been probably around for a long time. The bug itself seems to be related with ovsdb-server not updating the read-only flag properly. With a 3 nodes cluster running ovsdb-server in active/passive mode, when we restart the master-node, pacemaker promotes another node as master and moves the associated IPAddr2 resource to it. At this point, ovn-controller instances across the cloud reconnect to the new node but there's a window where ovsdb-server is still running as backup. For those ovn-controller instances that reconnect within that window, every attempt to write in the OVSDB will fail with "operation not allowed when database server is in read only mode". This state will remain forever unless a reconnection is forced. Restarting ovn-controller or killing the connection (for example with tcpkill) will make things work again. A workaround in OVN OCF script could be to wait for the ovsdb_server_promote function to wait until we get 'running/active' on that instance. Another open question is what should clients (in this case, ovn-controller) do in such situation? Shall they log an error and attempt a reconnection (rate limited)? Thoughts? Thanks a lot, Daniel ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] Re: Re:[HELP] Question about userspace geneve/vxlan port
Hi Ben: Thanks for your reply ! I didn't find the "native tunneling" document in OpenvSwitch repository. Did you mean the document "userspace-tunneling.rst". this document just tells us the br-phy can send tunnel pkt out, but when dpdk type port receives pkts with tunnel hdr, how could I configure the "native tunnel" mechanism to parse and handle these pkts? Or what you mean is currently OVS cannot handle parsing tunnel pkts in userspace ? Thank you Timo -- Ben Pfaff txfh2007 ovs-discuss Re: [ovs-discuss] Re:[HELP] Question about userspace geneve/vxlan port On Thu, Jul 04, 2019 at 05:27:28PM +0800, txfh2007 via discuss wrote: > I have found theoritically during the upcall process, task > tnl_port_receive could be called(via upcall_cb() -> upcall_receive() > -> xlate_lookup() ->xport_lookup). But in my env, after tracing code > by gdb, I have found the task "tnl_port_should_receive(flow)" always > returns "false" for flow->tunnel->ip_dst is "0", even if the pkt > received by dpdk port has a tunnel header. Yes. > I guess the reason is in userspace task "handle_packet_upcall", the > match.tun_md.valid has been set "false", so the expanded flow has no > tunnel info, and also in task "miniflow_extract" in flow.c, the > packet->md is null as in dfc_processing task the "md_is_valid" flag is > always "false". Am I right ? Yes. OVS takes what some might consider an idiosyncratic approach to tunnel processing. The "obvious" approach is to simply parse tunnel headers and throw those into the flow. If OVS did that, then you'd see what you expect, but this isn't what OVS does. Instead, OVS treats tunnel and their headers as metadata. This is because of OVS's history as part of the Linux kernel. The Linux kernel has tunnel implementations as part of the TCP/IP stack. When a tunnel packet arrives at a physical port in Linux, it passes into the TCP/IP stack, where it gets processed and received on a tunnel network device. This effectively strips the tunnel headers and transforms them into metadata. If the tunnel network device is part of an OVS bridge, then it gets the packet at that point, and treats the metadata as something that can be matched. With other datapaths, OVS expects some equivalent mechanism to exist. For the userspace datapath, OVS implements "native tunneling" to provide that mechanism. It's described in the OVS documentation. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss