Re: [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt
Update: Seems in upstream 5.4 linux, it only clears vlan_present vs old 4.15 kernel https://github.com/torvalds/linux/blob/v5.4/net/core/skbuff.c#L5408 int skb_vlan_pop(struct sk_buff *skb) { u16 vlan_tci; __be16 vlan_proto; int err; if (likely(skb_vlan_tag_present(skb))) { __vlan_hwaccel_clear_tag(skb); } else { ... static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; only clears 'present' flag } Hence, we patched stt on branch 2.16 ovs t ## update __push_stt_header on ovs 2.16 diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index 39a294764..ad1f0aa39 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c @@ -622,7 +622,9 @@ static int __push_stt_header(struct sk_buff *skb, __be64 tun_id, stth->flags |= STT_CSUM_VERIFIED; } - stth->vlan_tci = htons(skb->vlan_tci); + if (skb_vlan_tag_present(skb)) { +stth->vlan_tci = htons(skb->vlan_tci); +} skb->vlan_tci = 0; put_unaligned(tun_id, >key); Looks like part of linux change, stt side it was either not called out or missed. Hence, let us know for any further amendments on above changes if any as issue is mitigated with this patch and workaround is needed no more. We will do some more tests and call out for any other failures. Regards, Aliasgar On Tue, Apr 23, 2024 at 10:35 AM aginwala wrote: > Hi: > > Data plane restores when cleaning up flows using ovs-dpctl del-flows and > eventually all the flows catch up as flows added by ovn are intact. > However, not sure what flow caused this as the issue pops up on > ovs-vswitchd restarts and needs to be workaround by dpctl del-flows. Not > sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any > particular patch in ovs/ovn that already has this fix . Will keep looking > in parallel as the workaround unblocks this for now. Any additional > pointers would be good too vs this workaround. > > Regards, > Aliasgar > > > On Fri, Apr 19, 2024 at 4:24 PM aginwala wrote: > >> Hi All: >> >> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs >> connectivity is lost when setting chassis for provider network lrp to this >> new gateway. For interconnection gateways and hypervisors its not an issue/ >> lrp >> _uuid : 387a735d-fc11-4e90-8655-07785aa024af >> chassis : b80a285b-586a-42d9-b189-69d641f143b1 >> datapath: d9219b69-5961-4f24-8414-1d4054b23169 >> external_ids: {} >> gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50, >> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf, >> d1b42374-c475-4745-abdb-36e72140c5b5] >> logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e" >> mac : ["74:db:d1:80:d3:af 10.169.247.140/24"] >> nat_addresses : [] >> options : >> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"} >> parent_port : [] >> tag : [] >> tunnel_key : 2 >> type: chassisredirect >> >> provider network >> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90 >> type: localnet >> tag: 20 >> addresses: ["unknown"] >> ## encap ip for ovn is on eth0 >> >> ## gw interfaces brens2f0 hosts uplink provider network >> ovs-vsctl list-br >> br-int >> brens2f0 >> ovs-vsctl list-ports brens2f0 >> ens2f0 >> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int >> ## fail mode secure >> ovs-vsctl get-fail-mode br-int >> secure >> ## set chassis >> ovn-nbctl lrp-set-gateway-chassis >> lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e >> cee81be9-f782-4c82-800e-c5c5327531e4 101 >> >> ovn-controller is running as a container on the new gateway >> ovn-controller --version >> ovn-controller (Open vSwitch) 2.11.1-13 >> OpenFlow versions 0x4:0x4 >> >> ## ovs on the host 5.4 kernel >> ovs-vsctl --version >> ovs-vsctl (Open vSwitch) 2.16.0 >> DB Schema 8.3.0 >> >> ovs-ofctl --version >> ovs-ofctl (Open vSwitch) 2.16.0 >> OpenFlow versions 0x1:0x6 >> >> >> Digging further with tcpdump on the destination vm interface shows vlan >> being present causing connectivity failure and no reply packet >> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q >> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id >> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > >&g
Re: [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt
Hi: Data plane restores when cleaning up flows using ovs-dpctl del-flows and eventually all the flows catch up as flows added by ovn are intact. However, not sure what flow caused this as the issue pops up on ovs-vswitchd restarts and needs to be workaround by dpctl del-flows. Not sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any particular patch in ovs/ovn that already has this fix . Will keep looking in parallel as the workaround unblocks this for now. Any additional pointers would be good too vs this workaround. Regards, Aliasgar On Fri, Apr 19, 2024 at 4:24 PM aginwala wrote: > Hi All: > > Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs > connectivity is lost when setting chassis for provider network lrp to this > new gateway. For interconnection gateways and hypervisors its not an issue/ > lrp > _uuid : 387a735d-fc11-4e90-8655-07785aa024af > chassis : b80a285b-586a-42d9-b189-69d641f143b1 > datapath: d9219b69-5961-4f24-8414-1d4054b23169 > external_ids: {} > gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50, > 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf, > d1b42374-c475-4745-abdb-36e72140c5b5] > logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e" > mac : ["74:db:d1:80:d3:af 10.169.247.140/24"] > nat_addresses : [] > options : > {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"} > parent_port : [] > tag : [] > tunnel_key : 2 > type: chassisredirect > > provider network > port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90 > type: localnet > tag: 20 > addresses: ["unknown"] > ## encap ip for ovn is on eth0 > > ## gw interfaces brens2f0 hosts uplink provider network > ovs-vsctl list-br > br-int > brens2f0 > ovs-vsctl list-ports brens2f0 > ens2f0 > patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int > ## fail mode secure > ovs-vsctl get-fail-mode br-int > secure > ## set chassis > ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e > cee81be9-f782-4c82-800e-c5c5327531e4 101 > > ovn-controller is running as a container on the new gateway > ovn-controller --version > ovn-controller (Open vSwitch) 2.11.1-13 > OpenFlow versions 0x4:0x4 > > ## ovs on the host 5.4 kernel > ovs-vsctl --version > ovs-vsctl (Open vSwitch) 2.16.0 > DB Schema 8.3.0 > > ovs-ofctl --version > ovs-ofctl (Open vSwitch) 2.16.0 > OpenFlow versions 0x1:0x6 > > > Digging further with tcpdump on the destination vm interface shows vlan > being present causing connectivity failure and no reply packet > 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q > (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id > 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > > 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64 > 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q > (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id > 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > > 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64 > > openflow rules for atrip vlan 20 is correct that are programmed with ovn > on new/old gw : > ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20 > cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198, > idle_age=0, priority=100,reg15=0x1,metadata=0x1 > actions=mod_vlan_vid:20,output:161,strip_vlan > cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783, > idle_age=0, priority=150,in_port=161,dl_vlan=20 > actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8) > > > Checking ovs datapath flow shows vlan being present > ovs-dpctl dump-flows | grep vlan > recirc_id(0x422),tunnel(tun_id=0x1006605,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)), > packets:1713, bytes:174726, used:0.145s, actions:5 > > Couldn't find much drift with ofproto/trace > ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20 > running on old/new gw (replace with in_port) > > > Tried stripping on the hypervisor/compute and data plane is ok but thats > not the right approach > ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20 > actions=strip_vlan,outp
[ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt
Hi All: Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs connectivity is lost when setting chassis for provider network lrp to this new gateway. For interconnection gateways and hypervisors its not an issue/ lrp _uuid : 387a735d-fc11-4e90-8655-07785aa024af chassis : b80a285b-586a-42d9-b189-69d641f143b1 datapath: d9219b69-5961-4f24-8414-1d4054b23169 external_ids: {} gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50, 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf, d1b42374-c475-4745-abdb-36e72140c5b5] logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e" mac : ["74:db:d1:80:d3:af 10.169.247.140/24"] nat_addresses : [] options : {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"} parent_port : [] tag : [] tunnel_key : 2 type: chassisredirect provider network port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90 type: localnet tag: 20 addresses: ["unknown"] ## encap ip for ovn is on eth0 ## gw interfaces brens2f0 hosts uplink provider network ovs-vsctl list-br br-int brens2f0 ovs-vsctl list-ports brens2f0 ens2f0 patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int ## fail mode secure ovs-vsctl get-fail-mode br-int secure ## set chassis ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e cee81be9-f782-4c82-800e-c5c5327531e4 101 ovn-controller is running as a container on the new gateway ovn-controller --version ovn-controller (Open vSwitch) 2.11.1-13 OpenFlow versions 0x4:0x4 ## ovs on the host 5.4 kernel ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.16.0 DB Schema 8.3.0 ovs-ofctl --version ovs-ofctl (Open vSwitch) 2.16.0 OpenFlow versions 0x1:0x6 Digging further with tcpdump on the destination vm interface shows vlan being present causing connectivity failure and no reply packet 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64 openflow rules for atrip vlan 20 is correct that are programmed with ovn on new/old gw : ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20 cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198, idle_age=0, priority=100,reg15=0x1,metadata=0x1 actions=mod_vlan_vid:20,output:161,strip_vlan cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783, idle_age=0, priority=150,in_port=161,dl_vlan=20 actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8) Checking ovs datapath flow shows vlan being present ovs-dpctl dump-flows | grep vlan recirc_id(0x422),tunnel(tun_id=0x1006605,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)), packets:1713, bytes:174726, used:0.145s, actions:5 Couldn't find much drift with ofproto/trace ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20 running on old/new gw (replace with in_port) Tried stripping on the hypervisor/compute and data plane is ok but thats not the right approach ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20 actions=strip_vlan,output:4597" Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data plane with no vlan and 802.1q in the tcpdump on the destion workload tap interface. Is it a bug or known issue with later versions; post 2.11 version of ovs when tagged vlan is present for provider network? Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan flows are good. Any pointers further would be great as we continue to debug. Regards, Aliasgar ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] 2.16.0 deb: compilation/build error
Hi All: We ran into debian build issue for latest ovs v2.16.0 against 5.4.0-80-generic ubuntu 20 dh binary --with autoreconf,python3 --parallel dh: error: unable to load addon python3: Can't locate Debian/Debhelper/Sequence/python3.pm in @INC (you may need to install the Debian::Debhelper::Sequence::python3 module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.30.0 /usr/local/share/perl/5.30.0 /usr/lib/x86_64-linux-gnu/perl5/5.30 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.30 /usr/share/perl/5.30 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at (eval 13) line 1. BEGIN failed--compilation aborted at (eval 13) line 1. BEGIN failed--compilation aborted at (eval 13) line 1. make: *** [debian/rules:25: binary] Error 255 Was able to fix it by installing dh-python explicitly as a build dependency. Should we include this in debian/control dependencies? diff --git a/debian/control b/debian/control index 6420b9d3e2..53a6b61f14 100644 --- a/debian/control +++ b/debian/control @@ -18,7 +18,8 @@ Build-Depends: graphviz, python3-twisted, python3-zope.interface, libunbound-dev, - libunwind-dev + libunwind-dev, + dh-python Ali ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] HA OVN "Central" as a kubernetes service
Hi: Adding the ML too. Folks from k8s can comment on the same to see if ovn-k8s repo needs an update in the documentation for you to get the setup working when using their specs as is without any code changes in addition to using your own custom ovn images, etc. I am getting mail failure when adding ovn-k8s google group as I think I don't have permission to post there. Also the yaml specs and raft scripts have good comments which can give you a clear idea too. Also cc'd Girish who can comment further. Also things like volumes(PV) for ovn central dedicated nodes, monitoring, backing up ovn db, etc. needs to be considered so that when the pod is restarted or ovn version is upgraded, cluster settings are retained and cluster health stats are also taken into consideration. I got the design aspect of it sorted a week ago and had internal review too cc Han as we do not use ovn as CNI too including some pending containerizing items for ovn global dbs and ovn interconnect controller to use for ovn interconnect. However, it's pending testing in k8s with all the specs/tweaks due to some other priorities. As the approach taken by ovn-k8s is succinct and already tested, it shouldn't be a bottleneck. I agree that overall documentation needs to be consolidated on both ovn-k8s side or ovn repo. On Mon, Jul 6, 2020 at 9:49 AM Brendan Doyle wrote: > Hi, > > I've been trying to follow the instructions at > https://github.com/ovn-org/ovn-kubernetes > to set up an OVN "Central/Master" high availability (HA). I want to > deploy and manage that > cluster as a Kubernetes service . > > I can find lots of stuff on "ovn-kube" but this seems to be using OVN as > a kubernetes CNI instead of > Flannel etc. But this is not what I want to do, I have a kubernetes > cluster using Flannel as the CNI, > now I want to deploy a HA OVN "Central" as a kubernetes service. Kind > of like how you can deploy > a MySQL cluster in kubernetes using a SatefulSet deployment. > > I have found this: > https://github.com/ovn-org/ovn-kubernetes#readme > > But it is not clear to me if this is how to setup OVN as a kubernetes > CNI or it's how to setup a HA OVN central as kubernetes service. > > I did try he steps in the READMe above, but they did not seem to work, then > I have just seen that there is a ovnkube-db-raft.yaml file, this seems more > promising as it does use a StatefulSet, but I can find no documentation > on this > file. > > Thanks > > Brendan > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] HA OVN "Central" as a kubernetes service
On Mon, Jul 6, 2020 at 4:33 AM Brendan Doyle wrote: > Hi, > > So I'm really confused by what you have pointed me to here. As stated I do > NOT > want to use OVN as a CNI. I have a k8s cluster that use flannel as the > CNI. I simply > want to create an OVN "central" cluster as a Stateful set in my *existing* > K8 > config. > > This repo: > > https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d > > Seems to be for setting up a K8s cluster to use OVN as the CNI?? > Have you tried this? > What IP do the ovn-controllers use to reach the OVN "central cluster? > It seems to use an OVN docker image from docker.io, I want to use my own > OVN src > Do I use/modify the dist/images/Dockerfile in this repo? that has loads of > references to CNI > like I said I don't want to use OVN as the CNI?? > > A pre-req for running ovn central as a k8s app is containerize ovn central components. Hence, you need to start your own containers using docker. Either you follow the approach from ovn-k8s repo as to how to build ovn images or refer to the docker instructions in ovn repo. Since this app (ovn central) will run behind a k8s service, ovn-controller should point to the service ip of ovn central k8s app. k8s folks can comment on how to build image that is in k8s pod specs e.g http://docker.io/ovnkube/ovn-daemonset:latest > > The instructions here > https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst > seem more promising, if not a little confusing: > > IN the section "Starting OVN Central services in containers" > > Export following variables in .env and place it under project root: > > $ OVN_BRANCH= > $ OVN_VERSION= > $ DISTRO= > $ KERNEL_VERSION= > $ GITHUB_SRC= > $ DOCKER_REPO= > > > Does it mean create a file called ".env" and place it in the toplevel dir > of the cloned ovn repo? > Or does it mean just add these to you shell environment (i.e put them in > .bashrc)? > > You can just export OVN_BRANCH=xx in your shell for all variables and build your containers with desired distro/version using make build > > Then we have: > > 1) > > Start OVN containers using below command: > > $ docker run -itd --net=host --name=ovn-nb \ > : ovn-nb-tcp > > $ docker run -itd --net=host --name=ovn-sb \ > : ovn-sb-tcp > > $ docker run -itd --net=host --name=ovn-northd \ > : ovn-northd-tcp > > followed by > > 2) > > $ docker run -e "host_ip=" -e "nb_db_port=" -itd \ > --name=ovn-nb-raft --net=host --privileged : \ > ovn-nb-cluster-create > > $ docker run -e "host_ip=" -e "sb_db_port=" -itd \ > --name=ovn-sb-raft --net=host --privileged : \ > ovn-sb-cluster-create > > $ docker run -e "OVN_NB_DB=tcp::6641,tcp::6641,\ > tcp::6641" -e "OVN_SB_DB=tcp::6642,tcp::6642,\ > tcp::6642" -itd --name=ovn-northd-raft : \ > ovn-northd-cluster > > Does it mean do 1), then 2) or does it mean do 1) for non HA OVN central > *OR* 2) > for HA/clustered OVN Central? > > Doc says Start OVN containers in cluster mode using below command on node2 and node3 to make them join the peer using below command:. Hence, you can even play with just docker on 3 nodes where you run step1 on node1 that creates cluster and do the join-cluster on rest two nodes to give you a clear idea before moving to pod in k8s. Not sure if you need more details to update doc. We can always improvise. Upstream ovn-k8s does the same for pods where e.g. ovn-kube0 pod creates a cluster and rest two pods joins > It's not clear > > Thanks > > > > > > > On 25/06/2020 17:36, aginwala wrote: > > Hi: > > There are a couple of options as I have been exploring this too: > > 1. Upstream ovn-k8s patches ( > https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d) > uses statefulset and headless service for starting ovn central raft cluster > with 3 replicas. Cluster startup code and pod specs are pretty neat that > addresses most of the doubts. > > OVN components have been containerized too to start them in pods. You can > also refer to > https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst > for the same and use them to make it work in pod specs too. > > > 2. Write a new ovn operator similar to etcd operator > https://github.com/coreos/etcd-operator which just takes the count of > raft replicas and does the job in the background. > > I also added ovn-k8s group so they can comment on any other ideas too. > Hope it helps. > > > > On Thu,
Re: [ovs-discuss] HA OVN "Central" as a kubernetes service
Hi: There are a couple of options as I have been exploring this too: 1. Upstream ovn-k8s patches ( https://github.com/ovn-org/ovn-kubernetes/commit/a07b1a01af7e37b15c2e5f179ffad2b9f25a083d) uses statefulset and headless service for starting ovn central raft cluster with 3 replicas. Cluster startup code and pod specs are pretty neat that addresses most of the doubts. OVN components have been containerized too to start them in pods. You can also refer to https://github.com/ovn-org/ovn/blob/d6b56b1629d5984ef91864510f918e232efb89de/Documentation/intro/install/general.rst for the same and use them to make it work in pod specs too. 2. Write a new ovn operator similar to etcd operator https://github.com/coreos/etcd-operator which just takes the count of raft replicas and does the job in the background. I also added ovn-k8s group so they can comment on any other ideas too. Hope it helps. On Thu, Jun 25, 2020 at 7:15 AM Brendan Doyle wrote: > Hi, > > So I'm trying to find information on setting up an OVN "Central/Master" > high availability (HA) > Not as Active-Backup with Pacemaker, but as a cluster. But I want to > deploy and manage that > cluster as a Kubernetes service . > > I can find lots of stuff on "ovn-kube" but this seems to be using OVN as > a kubernetes CNI instead of > Flannel etc. But this is not what I want to do, I have a kubernetes > cluster using Flannel as the CNI, > now I want to deploy a HA OVN "Central" as a kubernetes service. Kind > of like how you can deploy > a MySQL cluster in kubernetes using a SatefulSet deployment. > > I have found this: > https://github.com/ovn-org/ovn-kubernetes#readme > > But it is not clear to me if this is how to setup OVN as a kubernetes > CNI or it's how to setup a HA > OVN central as kubernetes service. > > Can anybody comment, has anyone done this? > > > I guess I could run an OVN central as standalone and use a kubernetes > deployment with 3 > replica sets and "export" as a NodePort service. And have a > floating/VIP on my kubernetes > nodes. And direct ovn-controllers to the VIP. So only the pod that holds > the VIP would service > requests. This would work and give HA, but you don't get the performance > of an OVN > clustered Database Model, where each OVN central could service requests. > > > > > Thanks > > > Rdgs > Brendan > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] How to set inactivity_probe between members in RAFT cluster
On Fri, Feb 7, 2020 at 6:26 PM taoyunupt wrote: > Hi,Aliasgar, >Maybe I need to tell you the way how I deployed the > RAFT cluster, to make you understand my situation. I have 3 servers ,IPs > are 192.5.0.8, 192.5.0.9, 192.5.0.10. >After reading my steps,you may know why my RAFT > cluste do not have output of "ovn-sbctl get-connection",but It also > works most of time. > If the way I used to deployed cluster is not good , > please point it out. Thanks very much. > Hi Yun: Your approach to start the cluster seems correct. The reason you don't see the connection entry after creating cluster is because it's needed for clients to allow them to connect to the cluster and not to form the cluster. Hence, you just need an additional step to create one nb and one sb connection entry for which you can set connection to ptcp:6641/42 so that clients like northd, ovn-controller, etc can connect to the cluster. Please also refer to cluster db section in https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst. for more details. > > 1.First step to create cluster by ovsdb-tool commands > > *Create* a cluster on the first node, and the IP > address of this node is 192.5.0.8 >*# ovsdb-tool create-cluster > /etc/openvswitch/ovnsb_db.db /usr/share/openvswitch/ovn-sb.ovsschema > tc**p:192.5.0.8:6644 > <http://192.5.0.8:6644>* > * # ovsdb-tool create-cluster > /etc/openvswitch/ovnnb_db.db /usr/share/openvswitch/ovn-nb.ovssche**ma > tcp:**192.5.0**.8:664**3* > > *Join* the cluster on the second node, and the IP > address of this node is 192.5.0.9 > *# ovsdb-tool join-cluster > /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:192.5.0.9:6644 > <http://192.5.0.9:6644> tcp:192.5.0.8:6644 <http://192.5.0.8:6644> > tcp:192.5.0.10:6644 <http://192.5.0.10:6644>* > * # ovsdb-tool join-cluster > /etc/openvswitch/ovnnb_db.db OVN_Northbound tcp:192.5.0.9:6643 > <http://192.5.0.9:6643> tcp:192.5.0.8:6643 <http://192.5.0.8:6643> > tcp:192.5.0.10:6643 <http://192.5.0.10:6643>* > > *Join* the cluster on the third node, and the IP > address of this node is 192.5.0.10 > *#ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db > OVN_Southbound tcp:192.5.0.10:6644 <http://192.5.0.10:6644> > tcp:192.5.0.8:6644 <http://192.5.0.8:6644> tcp:192.5.0.9:6644 > <http://192.5.0.9:6644>* > *#ovsdb-tool join-cluster /etc/openvswitch/ovnnb_db.db > OVN_Northbound tcp:192.5.0.10:6643 <http://192.5.0.10:6643> > tcp:192.5.0.8:6643 <http://192.5.0.8:6643> tcp:192.5.0.9:6643 > <http://192.5.0.9:6643>* > > 2.Second step to conifg cluster > >Edit the / etc / sysconfig / ovn-northd file of each > node, add the OVN_NORTHD_OPTS option and content, >The IP of first node is 192.5.0.8, the added content > is,*Other > nodes are similar**:* > >*OVN_NORTHD_OPTS="--db-nb-addr=192.5.0.8 > --db-nb-create-insecure-remote=yes --db-sb-addr=192.5.0.8 \* > > *--db-sb-create-insecure-remote=yes --db-nb-cluster-local-addr=192.5.0.8 > --db-sb-cluster-local-addr=192.5.0.8 --ovn-northd-nb-db=tcp:192.5.0.8:6641 > <http://192.5.0.8:6641>,tcp:192.5.0.9:6641 > <http://192.5.0.9:6641>,tcp:192.5.0.10:6641 <http://192.5.0.10:6641> \* > > *--ovn-northd-sb-db=tcp:192.5.0.8:6642 > <http://192.5.0.8:6642>,tcp:192.5.0.9:6642 > <http://192.5.0.9:6642>,tcp:192.5.0.10:6642 <http://192.5.0.10:6642>"* > >3.Third step to start cluster > > Execute the following command to start the cluster > > #systemctl restart openvswitch ovn-northd > > Regards, > Yun > > > > > 在 2020-02-07 22:45:36,"taoyunupt" 写道: > > Hi,Aliasgar, > >Thanks for your reply. I have tried your suggestion. But I > found that it just could create one NB connection or one SB connection. > In RAFT, we need at least two. >That means the output of 'ovn-nbctl get-connection' has > two lines. What do you think if I want to fix this problem? >May be you don't need to consider how to have two > connections for NB. Actually, I want to know how to solve the > "inactivity_probe" problem. > > > > Regards, > Yun > > At 2020-02-07 03:05:37, "aginwala" wrote: > > Hi Yun: > > For changing inactivity probe which is 5 sec de
Re: [ovs-discuss] [OVN] How to set inactivity_probe between members in RAFT cluster
Hi Yun: For changing inactivity probe which is 5 sec default, you need to create connection entry both for sb and nb db. ovn-nbctl -- --id=@conn_uuid create Connection \ target="\:\:" \ inactivity_probe= -- set NB_Global . connections=@conn_uuid ovn-nbctl set connection . inactivity_probe= will then work! To tune the election timer for raft on say nb db, you can tune with below command: ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl cluster/change-election-timer OVN_Northbound You can run similar settings for sb db for tuning the value On Wed, Feb 5, 2020 at 4:00 AM taoyunupt wrote: > Hi,Numan, > I happend the problem that there are frequently elections > in RAFT cluster members . I think it was cause by the not good connection > between members of RARF cluster. As the log shows. > Becase the output of "ovn-sbctl get-connection" is none > in RAFT cluster member, So the command "ovn-sbctl set connection . > inactivity_probe=18" not works. > Do you know how to set "inactivity_probe" when we use > RAFT cluster? It will be appreciateed if you have more suggestions. > > > 2020-02-05T01:37:29.178Z|03424|reconnect|ERR|tcp:10.254.8.210:52048: no > response to inactivity probe after 5 seconds, disconnecting > 2020-02-05T01:37:30.519Z|03425|raft|INFO|tcp:10.xxx.8.210:59300: learned > server ID cdec > 2020-02-05T01:37:30.519Z|03426|raft|INFO|tcp:10.xxx.8.210:59300: learned > remote address tcp:10.254.8.210:6643 > 2020-02-05T03:52:02.791Z|03427|raft|INFO|received leadership transfer from > 3e2e in term 64 > 2020-02-05T03:52:02.791Z|03428|raft|INFO|term 65: starting election > 2020-02-05T03:52:02.792Z|03429|reconnect|INFO|tcp:10.xxx.8.208:6643: > connection closed by peer > 2020-02-05T03:52:02.869Z|03430|raft|INFO|term 65: elected leader by 2+ of > 3 servers > 2020-02-05T03:52:03.210Z|03431|raft|INFO|tcp:10.xxx.8.208:46140: learned > server ID 3e2e > 2020-02-05T03:52:03.210Z|03432|raft|INFO|tcp:10.xxx.8.208:46140: learned > remote address tcp:10.xxx.8.208:6643 > 2020-02-05T03:52:03.793Z|03433|reconnect|INFO|tcp:10.254.8.208:6643: > connecting... > 2020-02-05T03:52:03.793Z|03434|reconnect|INFO|tcp:10.254.8.208:6643: > connected > > > Thanks, > Yun > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
On Tue, Nov 12, 2019 at 10:57 PM Numan Siddique wrote: > On Wed, Nov 13, 2019 at 12:02 AM Shivaram Mysore > wrote: > > > > No need to indicate "built on Ubuntu" for docker image tags. > > Alpine tag is specifically used as it used different libraries and image > size is small. Ideally, for Docker images, we should use Alpine Linux. If > OVS for Alpine is latest, then image size will be further reduced. > > > > Note thAt at the end of the day, container is just a delivery or > packaging vehicle. > > > > /Shivaram > > ::Sent from my mobile device:: > > > > On Nov 12, 2019, at 9:49 AM, aginwala wrote: > > > > > > Thanks Shivaram: > > > > On Tue, Nov 12, 2019 at 9:28 AM Shivaram Mysore < > shivaram.mys...@gmail.com> wrote: > >> > >> I am not sure why "*_debian" is used. The image should work across > OS. I have not seen use of "*_linux" as most docker images use some form > of shell scripts. > >> > > Because the container image published is ubuntu and hence we tagged it > with _debian. It doesn't indicate it will not work on rhel. If we all agree > we can remove the tags and update the readme.md on docker.io that each > container image is using ubuntu as base image. I am fine with any approach. > >> > >> Also, in my opinion, the docker image should not build OVS. If it can > add appropriate OVS packages like > https://github.com/servicefractal/ovs/blob/master/Dockerfile is better as > they are already tested. Building OVS as a part of this will cause more > testing impacts and is unnecessary. The objective is to run OVS in a > container image. I would keep it simple. > > I think the idea was to have an OVS container image with the latest > master code right Aliasgar ? Yes. E.g ovs docker image for ovs release 2.12.0 with debian/rhel will checkout v2.12.0 code from git and build it. That way source code will exist in docker image from which ovs2.12.0 will be installed on that container. > > > Getting OVS packages is good, but then the debian/ubuntu/fedora > packages should be updated as soon as OVS does a release. You mean to say e.g install with dpkg -i for latest version pushed using *2.12.0.deb for debian and skip building from source Code? So I think the open question now is; do we want to have source code in version specific container image with ovs installed from source code or we just need version specific ovs installed without source in the container? Thanks > Numan > > >> > > I think the objective is to have an image per upstream stable ovs > release and hence building it in container. Hope everyone is ok here. > >> > >> On Tue, Nov 12, 2019 at 12:51 AM aginwala wrote: > >>> > >>> Thanks Guru. > >>> > >>> On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty wrote: > >>>> > >>>> > >>>> > >>>> On Mon, 11 Nov 2019 at 10:08, aginwala wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Fri, 8 Nov 2019 at 14:41, aginwala wrote: > >>>>>>> > >>>>>>> openvswitch.ko ships default with newer kernel but if we want to > use say stt, we need to build it with respective kernel for host on which > we will run. Hence, to skip host level installation , we pack the modules > in container. > >>>>>> > >>>>>> > >>>>>> It is not clear to me. Is DKMS enabled here? Or is it that > openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on kernel > 4.15.0-66-generic? > >>>>>> > >>>>> > >>>>> No. Dkms is not enabled because idea is to release a new docker > image for every new kernel upgrade on compute (Not sure if dkms will help > much in container case as we are not installing on host). Do you have any > specific use case which? Yes on host with 4.15.0-66-generic. > >>>> > >>>> > >>>> It will probably be very hard to release each OVS version to so many > available kernels. How do you decide which kernel that you want to release > a image for? What is the plan here? I think it makes sense to release one > image without a kernel module packed with it. > >>>> > >>> Agree, we can't publish too many images based on different kernel > versions. Hence, I am ok with the approach you proposed by publishing > singl
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Thanks Shivaram: On Tue, Nov 12, 2019 at 9:28 AM Shivaram Mysore wrote: > I am not sure why "*_debian" is used. The image should work across OS. > I have not seen use of "*_linux" as most docker images use some form of > shell scripts. > > Because the container image published is ubuntu and hence we tagged it with _debian. It doesn't indicate it will not work on rhel. If we all agree we can remove the tags and update the readme.md on docker.io that each container image is using ubuntu as base image. I am fine with any approach. > Also, in my opinion, the docker image should not build OVS. If it can add > appropriate OVS packages like > https://github.com/servicefractal/ovs/blob/master/Dockerfile is better as > they are already tested. Building OVS as a part of this will cause more > testing impacts and is unnecessary. The objective is to run OVS in a > container image. I would keep it simple. > > I think the objective is to have an image per upstream stable ovs release and hence building it in container. Hope everyone is ok here. > On Tue, Nov 12, 2019 at 12:51 AM aginwala wrote: > >> Thanks Guru. >> >> On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty wrote: >> >>> >>> >>> On Mon, 11 Nov 2019 at 10:08, aginwala wrote: >>> >>>> >>>> >>>> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty wrote: >>>> >>>>> >>>>> >>>>> On Fri, 8 Nov 2019 at 14:41, aginwala wrote: >>>>> >>>>>> openvswitch.ko ships default with newer kernel but if we want to use >>>>>> say stt, we need to build it with respective kernel for host on which we >>>>>> will run. Hence, to skip host level installation , we pack the modules in >>>>>> container. >>>>>> >>>>> >>>>> It is not clear to me. Is DKMS enabled here? Or is it that >>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on >>>>> kernel 4.15.0-66-generic? >>>>> >>>>> >>>> No. Dkms is not enabled because idea is to release a new docker image >>>> for every new kernel upgrade on compute (Not sure if dkms will help much in >>>> container case as we are not installing on host). Do you have any specific >>>> use case which? Yes on host with 4.15.0-66-generic. >>>> >>> >>> It will probably be very hard to release each OVS version to so many >>> available kernels. How do you decide which kernel that you want to release >>> a image for? What is the plan here? I think it makes sense to release one >>> image without a kernel module packed with it. >>> >>> Agree, we can't publish too many images based on different kernel >> versions. Hence, I am ok with the approach you proposed by publishing >> single image for each stable release leveraging host kernel modules. I have >> pushed 2 debian images for each stable releases 2.11.2_debian and >> 2.12.0_debian under openvswitch/ovs accordingly. I also sent the >> corresponding patch https://patchwork.ozlabs.org/patch/1193372/ to >> refactor the docker builds to support an option to skip kernel modules for >> ovs repo so that user can choose to build/run with/without kernel modules. >> Let me know further. >> >> >>> >>> >>>> >>>>>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 8 Nov 2019 at 14:18, aginwala wrote: >>>>>>> >>>>>>>> Hi all: >>>>>>>> >>>>>>>> >>>>>>>> I have pushed two images to public openvswitch org on docker.io >>>>>>>> for ovs and ovn; >>>>>>>> OVS for ubuntu with 4.15 kernel: >>>>>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >>>>>>>> >>>>>>> >>>>>>> Why is the kernel important here? Is the OVS kernel module being >>>>>>> packed? >>>>>>> >>>>>>> >>>>>>>> run as : docker run -itd --net=host >>>>>>>> --name=ovsdb-server openvswitch/ovs:2.12.0_debian_4.15.0-66-generic >>>>>>>> ovsdb-server >>>>>>>> docker run -itd --net=host >>>>>>>> --name=ovs-vswitchd --volumes-from=ovsdb-serve
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Thanks Guru. On Mon, Nov 11, 2019 at 1:03 PM Guru Shetty wrote: > > > On Mon, 11 Nov 2019 at 10:08, aginwala wrote: > >> >> >> On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty wrote: >> >>> >>> >>> On Fri, 8 Nov 2019 at 14:41, aginwala wrote: >>> >>>> openvswitch.ko ships default with newer kernel but if we want to use >>>> say stt, we need to build it with respective kernel for host on which we >>>> will run. Hence, to skip host level installation , we pack the modules in >>>> container. >>>> >>> >>> It is not clear to me. Is DKMS enabled here? Or is it that >>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on >>> kernel 4.15.0-66-generic? >>> >>> >> No. Dkms is not enabled because idea is to release a new docker image for >> every new kernel upgrade on compute (Not sure if dkms will help much in >> container case as we are not installing on host). Do you have any specific >> use case which? Yes on host with 4.15.0-66-generic. >> > > It will probably be very hard to release each OVS version to so many > available kernels. How do you decide which kernel that you want to release > a image for? What is the plan here? I think it makes sense to release one > image without a kernel module packed with it. > > Agree, we can't publish too many images based on different kernel versions. Hence, I am ok with the approach you proposed by publishing single image for each stable release leveraging host kernel modules. I have pushed 2 debian images for each stable releases 2.11.2_debian and 2.12.0_debian under openvswitch/ovs accordingly. I also sent the corresponding patch https://patchwork.ozlabs.org/patch/1193372/ to refactor the docker builds to support an option to skip kernel modules for ovs repo so that user can choose to build/run with/without kernel modules. Let me know further. > > >> >>>> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: >>>> >>>>> >>>>> >>>>> On Fri, 8 Nov 2019 at 14:18, aginwala wrote: >>>>> >>>>>> Hi all: >>>>>> >>>>>> >>>>>> I have pushed two images to public openvswitch org on docker.io for >>>>>> ovs and ovn; >>>>>> OVS for ubuntu with 4.15 kernel: >>>>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >>>>>> >>>>> >>>>> Why is the kernel important here? Is the OVS kernel module being >>>>> packed? >>>>> >>>>> >>>>>> run as : docker run -itd --net=host --name=ovsdb-server >>>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server >>>>>> docker run -itd --net=host >>>>>> --name=ovs-vswitchd --volumes-from=ovsdb-server --privileged >>>>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd >>>>>> >>>>>> OVN debian docker image: >>>>>> *openvswitch/ovn:2.12_e60f2f2_debian_master* as we don't have a >>>>>> branch cut out for ovn yet. (Hence, tagged it with last commit on master) >>>>>> Follow steps as per: >>>>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst >>>>>> >>>>>> >>>>>> Thanks Guru for sorting out the access/cleanups for openvswitch org >>>>>> on docker.io. >>>>>> >>>>>> We can plan to align this docker push for each stable release ahead. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: >>>>>> >>>>>>> Thanks Guru: >>>>>>> >>>>>>> Sounds good. Can you please grant user aginwala as admin? I can >>>>>>> create two repos ovs and ovn under openvswitch org and can push new >>>>>>> stable >>>>>>> release versions there. >>>>>>> >>>>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: >>>>>>> >>>>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >>>>>>>> >>>>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy >>>>>>>>> to provide it to whoever the admin is. >>>>>>>>&g
Re: [ovs-discuss] OVS/OVN docker image for each stable release
On Mon, Nov 11, 2019 at 9:00 AM Guru Shetty wrote: > > > On Fri, 8 Nov 2019 at 14:41, aginwala wrote: > >> openvswitch.ko ships default with newer kernel but if we want to use say >> stt, we need to build it with respective kernel for host on which we will >> run. Hence, to skip host level installation , we pack the modules in >> container. >> > > It is not clear to me. Is DKMS enabled here? Or is it that > openvswitch/ovs:2.12.0_debian_4.15.0-66-generic will only work on > kernel 4.15.0-66-generic? > > No. Dkms is not enabled because idea is to release a new docker image for every new kernel upgrade on compute (Not sure if dkms will help much in container case as we are not installing on host). Do you have any specific use case which? Yes on host with 4.15.0-66-generic. > >> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: >> >>> >>> >>> On Fri, 8 Nov 2019 at 14:18, aginwala wrote: >>> >>>> Hi all: >>>> >>>> >>>> I have pushed two images to public openvswitch org on docker.io for >>>> ovs and ovn; >>>> OVS for ubuntu with 4.15 kernel: >>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >>>> >>> >>> Why is the kernel important here? Is the OVS kernel module being packed? >>> >>> >>>> run as : docker run -itd --net=host --name=ovsdb-server >>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server >>>> docker run -itd --net=host --name=ovs-vswitchd >>>> --volumes-from=ovsdb-server --privileged >>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd >>>> >>>> OVN debian docker image: *openvswitch/ovn:2.12_e60f2f2_debian_master* >>>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last >>>> commit on master) >>>> Follow steps as per: >>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst >>>> >>>> >>>> Thanks Guru for sorting out the access/cleanups for openvswitch org on >>>> docker.io. >>>> >>>> We can plan to align this docker push for each stable release ahead. >>>> >>>> >>>> >>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: >>>> >>>>> Thanks Guru: >>>>> >>>>> Sounds good. Can you please grant user aginwala as admin? I can create >>>>> two repos ovs and ovn under openvswitch org and can push new stable >>>>> release >>>>> versions there. >>>>> >>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: >>>>> >>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >>>>>> >>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy >>>>>>> to provide it to whoever the admin is. >>>>>>> >>>>>> >>>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it >>>>>> has one stale image. >>>>>> >>>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while >>>>>> true; do echo hello world; sleep 1; done" >>>>>> >>>>>> So if we want the name "openvswitch", this is one option. If we >>>>>> prefer ovs/ovn or other keywords, then the admin can create a new one. >>>>>> >>>>>> >>>>>>> >>>>>>> On Thu, 7 Nov 2019 at 13:15, aginwala wrote: >>>>>>> >>>>>>>> Hi All: >>>>>>>> >>>>>>>> As discussed in the meeting today, we all agreed that it will be a >>>>>>>> good idea to push docker images for each new ovs/ovn stable release. >>>>>>>> Hence, >>>>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open >>>>>>>> action >>>>>>>> items as it is more of org/ownership/rights related: >>>>>>>> >>>>>>>>1. Get new repo created under docker.io with name either >>>>>>>>ovs/ovn and declare it public repo >>>>>>>>2. How about copy-rights for running images for open source >>>>>>>>projects >>>>>>>>3. Storage: unlimited or some limited GBs >>>>>>>>4. Naming conventions for docker images ;e.g >>>>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. >>>>>>>>Similar for ovs. >>>>>>>> >>>>>>>> >>>>>>>> Once this is done, we can bundle docker image changes in the same >>>>>>>> release process >>>>>>>> >>>>>>>> Please feel free to add any missing piece. >>>>>>>> >>>>>>>> ___ >>>>>>>> discuss mailing list >>>>>>>> disc...@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>> >>>>>>> ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Sounds good. Looking forward to it. Just want to reiterate that this discussion is more about maintaining docker image for each stable ovn/ovs upstream release and inputs from mainteners. We can start a separate thread for performance and other issues for running ovs/ovn in containers accordingly after your talk. On Fri, Nov 8, 2019 at 4:53 PM Shivaram Mysore wrote: > I am giving a talk about the same at OVS conference. Most of the info is > documented in the github repo. > > If that does not help, please post questions and I will help document the > same. > > /Shivaram > ::Sent from my mobile device:: > > On Nov 8, 2019, at 6:49 PM, aginwala wrote: > > > Hi Shivaram: > > Thanks for comments. Can you explain what is the bottleneck? Also for > addressing performance related issues that you suggested, I would say if > you can submit PR in ovs repo mentioning to use additional docker options > for startup for better performance, it would be helpful. I did not get a > chance to try out additional options apart from the base ones as it just > does its job at-least while running ovs/ovn in pre-prod/testing env. Didn't > get as chance to scale test it. > > On Fri, Nov 8, 2019 at 3:35 PM Shivaram Mysore > wrote: > >> The point about kernel module is correct- no need to include it in docket >> image. It will not work. >> >> /Shivaram >> ::Sent from my mobile device:: >> >> On Nov 8, 2019, at 5:42 PM, aginwala wrote: >> >> >> openvswitch.ko ships default with newer kernel but if we want to use say >> stt, we need to build it with respective kernel for host on which we will >> run. Hence, to skip host level installation , we pack the modules in >> container. >> >> On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: >> >>> >>> >>> On Fri, 8 Nov 2019 at 14:18, aginwala wrote: >>> >>>> Hi all: >>>> >>>> >>>> I have pushed two images to public openvswitch org on docker.io for >>>> ovs and ovn; >>>> OVS for ubuntu with 4.15 kernel: >>>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >>>> >>> >>> Why is the kernel important here? Is the OVS kernel module being packed? >>> >>> >>>> run as : docker run -itd --net=host --name=ovsdb-server >>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server >>>> docker run -itd --net=host --name=ovs-vswitchd >>>> --volumes-from=ovsdb-server --privileged >>>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd >>>> >>>> OVN debian docker image: *openvswitch/ovn:2.12_e60f2f2_debian_master* >>>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last >>>> commit on master) >>>> Follow steps as per: >>>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst >>>> >>>> >>>> Thanks Guru for sorting out the access/cleanups for openvswitch org on >>>> docker.io. >>>> >>>> We can plan to align this docker push for each stable release ahead. >>>> >>>> >>>> >>>> On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: >>>> >>>>> Thanks Guru: >>>>> >>>>> Sounds good. Can you please grant user aginwala as admin? I can create >>>>> two repos ovs and ovn under openvswitch org and can push new stable >>>>> release >>>>> versions there. >>>>> >>>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: >>>>> >>>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >>>>>> >>>>>>> I had created a openvswitch repo in docker as a placeholder. Happy >>>>>>> to provide it to whoever the admin is. >>>>>>> >>>>>> >>>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it >>>>>> has one stale image. >>>>>> >>>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while >>>>>> true; do echo hello world; sleep 1; done" >>>>>> >>>>>> So if we want the name "openvswitch", this is one option. If we >>>>>> prefer ovs/ovn or other keywords, then the admin can create a new one. >>>>
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Hi Shivaram: Thanks for comments. Can you explain what is the bottleneck? Also for addressing performance related issues that you suggested, I would say if you can submit PR in ovs repo mentioning to use additional docker options for startup for better performance, it would be helpful. I did not get a chance to try out additional options apart from the base ones as it just does its job at-least while running ovs/ovn in pre-prod/testing env. Didn't get as chance to scale test it. On Fri, Nov 8, 2019 at 3:35 PM Shivaram Mysore wrote: > The point about kernel module is correct- no need to include it in docket > image. It will not work. > > /Shivaram > ::Sent from my mobile device:: > > On Nov 8, 2019, at 5:42 PM, aginwala wrote: > > > openvswitch.ko ships default with newer kernel but if we want to use say > stt, we need to build it with respective kernel for host on which we will > run. Hence, to skip host level installation , we pack the modules in > container. > > On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: > >> >> >> On Fri, 8 Nov 2019 at 14:18, aginwala wrote: >> >>> Hi all: >>> >>> >>> I have pushed two images to public openvswitch org on docker.io for ovs >>> and ovn; >>> OVS for ubuntu with 4.15 kernel: >>> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >>> >> >> Why is the kernel important here? Is the OVS kernel module being packed? >> >> >>> run as : docker run -itd --net=host --name=ovsdb-server >>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server >>> docker run -itd --net=host --name=ovs-vswitchd >>> --volumes-from=ovsdb-server --privileged >>> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd >>> >>> OVN debian docker image: *openvswitch/ovn:2.12_e60f2f2_debian_master* >>> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last >>> commit on master) >>> Follow steps as per: >>> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst >>> >>> >>> Thanks Guru for sorting out the access/cleanups for openvswitch org on >>> docker.io. >>> >>> We can plan to align this docker push for each stable release ahead. >>> >>> >>> >>> On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: >>> >>>> Thanks Guru: >>>> >>>> Sounds good. Can you please grant user aginwala as admin? I can create >>>> two repos ovs and ovn under openvswitch org and can push new stable release >>>> versions there. >>>> >>>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: >>>> >>>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >>>>> >>>>>> I had created a openvswitch repo in docker as a placeholder. Happy to >>>>>> provide it to whoever the admin is. >>>>>> >>>>> >>>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it >>>>> has one stale image. >>>>> >>>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while >>>>> true; do echo hello world; sleep 1; done" >>>>> >>>>> So if we want the name "openvswitch", this is one option. If we prefer >>>>> ovs/ovn or other keywords, then the admin can create a new one. >>>>> >>>>> >>>>>> >>>>>> On Thu, 7 Nov 2019 at 13:15, aginwala wrote: >>>>>> >>>>>>> Hi All: >>>>>>> >>>>>>> As discussed in the meeting today, we all agreed that it will be a >>>>>>> good idea to push docker images for each new ovs/ovn stable release. >>>>>>> Hence, >>>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open >>>>>>> action >>>>>>> items as it is more of org/ownership/rights related: >>>>>>> >>>>>>>1. Get new repo created under docker.io with name either ovs/ovn >>>>>>>and declare it public repo >>>>>>>2. How about copy-rights for running images for open source >>>>>>>projects >>>>>>>3. Storage: unlimited or some limited GBs >>>>>>>4. Naming conventions for docker images ;e.g >>>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. >>>>>>>Similar for ovs. >>>>>>> >>>>>>> >>>>>>> Once this is done, we can bundle docker image changes in the same >>>>>>> release process >>>>>>> >>>>>>> Please feel free to add any missing piece. >>>>>>> >>>>>>> ___ >>>>>>> discuss mailing list >>>>>>> disc...@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>> >>>>>> ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
openvswitch.ko ships default with newer kernel but if we want to use say stt, we need to build it with respective kernel for host on which we will run. Hence, to skip host level installation , we pack the modules in container. On Fri, Nov 8, 2019 at 2:37 PM Guru Shetty wrote: > > > On Fri, 8 Nov 2019 at 14:18, aginwala wrote: > >> Hi all: >> >> >> I have pushed two images to public openvswitch org on docker.io for ovs >> and ovn; >> OVS for ubuntu with 4.15 kernel: >> *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* >> > > Why is the kernel important here? Is the OVS kernel module being packed? > > >> run as : docker run -itd --net=host --name=ovsdb-server >> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server >> docker run -itd --net=host --name=ovs-vswitchd >> --volumes-from=ovsdb-server --privileged >> openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd >> >> OVN debian docker image: *openvswitch/ovn:2.12_e60f2f2_debian_master* >> as we don't have a branch cut out for ovn yet. (Hence, tagged it with last >> commit on master) >> Follow steps as per: >> https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst >> >> >> Thanks Guru for sorting out the access/cleanups for openvswitch org on >> docker.io. >> >> We can plan to align this docker push for each stable release ahead. >> >> >> >> On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: >> >>> Thanks Guru: >>> >>> Sounds good. Can you please grant user aginwala as admin? I can create >>> two repos ovs and ovn under openvswitch org and can push new stable release >>> versions there. >>> >>> On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: >>> >>>> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >>>> >>>>> I had created a openvswitch repo in docker as a placeholder. Happy to >>>>> provide it to whoever the admin is. >>>>> >>>> >>>> i.e. You can use the keyword "openvswitch". For e.g., right now, it has >>>> one stale image. >>>> >>>> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while >>>> true; do echo hello world; sleep 1; done" >>>> >>>> So if we want the name "openvswitch", this is one option. If we prefer >>>> ovs/ovn or other keywords, then the admin can create a new one. >>>> >>>> >>>>> >>>>> On Thu, 7 Nov 2019 at 13:15, aginwala wrote: >>>>> >>>>>> Hi All: >>>>>> >>>>>> As discussed in the meeting today, we all agreed that it will be a >>>>>> good idea to push docker images for each new ovs/ovn stable release. >>>>>> Hence, >>>>>> need help from maintainers Ben/Mark/Justin/Han to address some open >>>>>> action >>>>>> items as it is more of org/ownership/rights related: >>>>>> >>>>>>1. Get new repo created under docker.io with name either ovs/ovn >>>>>>and declare it public repo >>>>>>2. How about copy-rights for running images for open source >>>>>>projects >>>>>>3. Storage: unlimited or some limited GBs >>>>>>4. Naming conventions for docker images ;e.g >>>>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. >>>>>>Similar for ovs. >>>>>> >>>>>> >>>>>> Once this is done, we can bundle docker image changes in the same >>>>>> release process >>>>>> >>>>>> Please feel free to add any missing piece. >>>>>> >>>>>> ___ >>>>>> discuss mailing list >>>>>> disc...@openvswitch.org >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>> >>>>> ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Hi all: I have pushed two images to public openvswitch org on docker.io for ovs and ovn; OVS for ubuntu with 4.15 kernel: *openvswitch/ovs:2.12.0_debian_4.15.0-66-generic* run as : docker run -itd --net=host --name=ovsdb-server openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovsdb-server docker run -itd --net=host --name=ovs-vswitchd --volumes-from=ovsdb-server --privileged openvswitch/ovs:2.12.0_debian_4.15.0-66-generic ovs-vswitchd OVN debian docker image: *openvswitch/ovn:2.12_e60f2f2_debian_master* as we don't have a branch cut out for ovn yet. (Hence, tagged it with last commit on master) Follow steps as per: https://github.com/ovn-org/ovn/blob/master/Documentation/intro/install/general.rst Thanks Guru for sorting out the access/cleanups for openvswitch org on docker.io. We can plan to align this docker push for each stable release ahead. On Fri, Nov 8, 2019 at 10:17 AM aginwala wrote: > Thanks Guru: > > Sounds good. Can you please grant user aginwala as admin? I can create two > repos ovs and ovn under openvswitch org and can push new stable release > versions there. > > On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: > >> On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: >> >>> I had created a openvswitch repo in docker as a placeholder. Happy to >>> provide it to whoever the admin is. >>> >> >> i.e. You can use the keyword "openvswitch". For e.g., right now, it has >> one stale image. >> >> docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while true; >> do echo hello world; sleep 1; done" >> >> So if we want the name "openvswitch", this is one option. If we prefer >> ovs/ovn or other keywords, then the admin can create a new one. >> >> >>> >>> On Thu, 7 Nov 2019 at 13:15, aginwala wrote: >>> >>>> Hi All: >>>> >>>> As discussed in the meeting today, we all agreed that it will be a good >>>> idea to push docker images for each new ovs/ovn stable release. Hence, need >>>> help from maintainers Ben/Mark/Justin/Han to address some open action items >>>> as it is more of org/ownership/rights related: >>>> >>>>1. Get new repo created under docker.io with name either ovs/ovn >>>>and declare it public repo >>>>2. How about copy-rights for running images for open source projects >>>>3. Storage: unlimited or some limited GBs >>>>4. Naming conventions for docker images ;e.g >>>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar >>>>for ovs. >>>> >>>> >>>> Once this is done, we can bundle docker image changes in the same >>>> release process >>>> >>>> Please feel free to add any missing piece. >>>> >>>> ___ >>>> discuss mailing list >>>> disc...@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>> >>> ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Thanks Guru: Sounds good. Can you please grant user aginwala as admin? I can create two repos ovs and ovn under openvswitch org and can push new stable release versions there. On Fri, Nov 8, 2019 at 10:04 AM Guru Shetty wrote: > On Fri, 8 Nov 2019 at 09:53, Guru Shetty wrote: > >> I had created a openvswitch repo in docker as a placeholder. Happy to >> provide it to whoever the admin is. >> > > i.e. You can use the keyword "openvswitch". For e.g., right now, it has > one stale image. > > docker run -d --net=none openvswitch/ipam:v2.4.90 /bin/sh -c "while true; > do echo hello world; sleep 1; done" > > So if we want the name "openvswitch", this is one option. If we prefer > ovs/ovn or other keywords, then the admin can create a new one. > > >> >> On Thu, 7 Nov 2019 at 13:15, aginwala wrote: >> >>> Hi All: >>> >>> As discussed in the meeting today, we all agreed that it will be a good >>> idea to push docker images for each new ovs/ovn stable release. Hence, need >>> help from maintainers Ben/Mark/Justin/Han to address some open action items >>> as it is more of org/ownership/rights related: >>> >>>1. Get new repo created under docker.io with name either ovs/ovn and >>>declare it public repo >>>2. How about copy-rights for running images for open source projects >>>3. Storage: unlimited or some limited GBs >>>4. Naming conventions for docker images ;e.g >>>openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar >>>for ovs. >>> >>> >>> Once this is done, we can bundle docker image changes in the same >>> release process >>> >>> Please feel free to add any missing piece. >>> >>> ___ >>> discuss mailing list >>> disc...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>> >> ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVS/OVN docker image for each stable release
Thanks Shivaram. I will wait for mainteners to comment as it would be nice to host a docker image of at-least one stable release to start with which could be either 2.11/2.12 or upcoming 2.13 version under ovs/ovn org. What do you think? On Thu, Nov 7, 2019 at 3:56 PM Shivaram Mysore wrote: > Hi > If it is useful, we can start this with: > GitHub.com/ServiceFractal/ovs > > /Shivaram > ::Sent from my mobile device:: > > On Nov 7, 2019, at 4:15 PM, aginwala wrote: > > > Hi All: > > As discussed in the meeting today, we all agreed that it will be a good > idea to push docker images for each new ovs/ovn stable release. Hence, need > help from maintainers Ben/Mark/Justin/Han to address some open action items > as it is more of org/ownership/rights related: > >1. Get new repo created under docker.io with name either ovs/ovn and >declare it public repo >2. How about copy-rights for running images for open source projects >3. Storage: unlimited or some limited GBs >4. Naming conventions for docker images ;e.g >openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar >for ovs. > > > Once this is done, we can bundle docker image changes in the same release > process > > Please feel free to add any missing piece. > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN RBAC role for ovn-northd?
Thanks Frode for covering that. Added minor comments too your PR and you can send formal patch. On Thu, Nov 7, 2019 at 2:00 PM Frode Nordahl wrote: > fwiw; I proposed this small note earlier this evening: > https://github.com/ovn-org/ovn/pull/25 > > tor. 7. nov. 2019, 21:47 skrev Ben Pfaff : > >> Sure, anything helps. >> >> On Thu, Nov 07, 2019 at 12:27:44PM -0800, aginwala wrote: >> > Hi Ben: >> > >> > It seems RBAC doc >> > >> http://docs.openvswitch.org/en/stable/tutorials/ovn-rbac/#configuring-rbac >> > only talks >> > about chassis and not mentioning about northd. I can submit a patch to >> > update that as a todo for northd and mention the workaround until we add >> > formal support. Is that ok? >> > >> > >> > >> > >> > On Thu, Nov 7, 2019 at 12:14 PM Ben Pfaff wrote: >> > >> > > Have we documented this? Should we? >> > > >> > > On Thu, Nov 07, 2019 at 10:20:22AM -0800, aginwala wrote: >> > > > Hi: >> > > > >> > > > It is a known fact and have-been discussed before. We use the same >> > > > workaround as you mentioned. Alternatively, you can also set >> role="" and >> > > it >> > > > will work for both northd and ovn-controller instead of separate >> > > listeners >> > > > which is also a security loop-hole. In short, some work is needed >> here >> > > > to handle rbac for northd. >> > > > >> > > > On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl < >> > > frode.nord...@canonical.com> >> > > > wrote: >> > > > >> > > > > Hello all, >> > > > > >> > > > > TL;DR; When enabling the `ovn-controller` role on the SB DB >> > > `ovsdb-server` >> > > > > listener, `ovn-northd` no longer has the necessary access to do >> its job >> > > > > when you are unable to use the local unix socket for its >> connection to >> > > the >> > > > > database. >> > > > > >> > > > > AFAICT there is no northd-specifc or admin type role available, >> have I >> > > > > missed something? >> > > > > >> > > > > I have worked around the issue by enabling a separate listener on >> a >> > > > > different port on the Southbound ovsdb-servers so that >> `ovn-northd` can >> > > > > connect to that. >> > > > > >> > > > > >> > > > > I have a OVN deployment with central components spread across >> three >> > > > > machines, there is an instance of the Northbound and Southbound >> > > > > `ovsdb-server` on each of them which are clustered, and there is >> also >> > > an >> > > > > instance of `ovn-northd` on each of them. >> > > > > >> > > > > The deployment is TLS-enabled and I have enabled RBAC. >> > > > > >> > > > > Since the DBs are clustered I have no control of which machine >> will be >> > > the >> > > > > leader, and it may be that one machine has the leader for the >> > > Northbound DB >> > > > > and a different machine has the leader of the Southbound DB. >> > > > > >> > > > > Because of this ovn-northd is unable to talk to the databases >> through a >> > > > > local unix socket and must use a TLS-enabled connection to the >> DBs, and >> > > > > herein lies the problem. >> > > > > >> > > > > >> > > > > I peeked at the RBAC implementation, and it appears to me that the >> > > > > permission system is tied to having specific columns in each >> table that >> > > > > maps to the name of the client that wants permission. On the >> surface >> > > this >> > > > > appears to not fit with `ovn-northd`'s needs as I would think it >> would >> > > need >> > > > > full access to all tables perhaps based on a centrally managed >> set of >> > > > > hostnames. >> > > > > >> > > > > -- >> > > > > Frode Nordahl >> > > > > >> > > > > ___ >> > > > > discuss mailing list >> > > > > disc...@openvswitch.org >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > > > >> > > >> > > > ___ >> > > > discuss mailing list >> > > > disc...@openvswitch.org >> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > >> > > >> > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] OVS/OVN docker image for each stable release
Hi All: As discussed in the meeting today, we all agreed that it will be a good idea to push docker images for each new ovs/ovn stable release. Hence, need help from maintainers Ben/Mark/Justin/Han to address some open action items as it is more of org/ownership/rights related: 1. Get new repo created under docker.io with name either ovs/ovn and declare it public repo 2. How about copy-rights for running images for open source projects 3. Storage: unlimited or some limited GBs 4. Naming conventions for docker images ;e.g openswitch/ovn:2.13.1_debian or openswitch/ovn:2.13.1_rhel. Similar for ovs. Once this is done, we can bundle docker image changes in the same release process Please feel free to add any missing piece. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN RBAC role for ovn-northd?
Hi Ben: It seems RBAC doc http://docs.openvswitch.org/en/stable/tutorials/ovn-rbac/#configuring-rbac only talks about chassis and not mentioning about northd. I can submit a patch to update that as a todo for northd and mention the workaround until we add formal support. Is that ok? On Thu, Nov 7, 2019 at 12:14 PM Ben Pfaff wrote: > Have we documented this? Should we? > > On Thu, Nov 07, 2019 at 10:20:22AM -0800, aginwala wrote: > > Hi: > > > > It is a known fact and have-been discussed before. We use the same > > workaround as you mentioned. Alternatively, you can also set role="" and > it > > will work for both northd and ovn-controller instead of separate > listeners > > which is also a security loop-hole. In short, some work is needed here > > to handle rbac for northd. > > > > On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl < > frode.nord...@canonical.com> > > wrote: > > > > > Hello all, > > > > > > TL;DR; When enabling the `ovn-controller` role on the SB DB > `ovsdb-server` > > > listener, `ovn-northd` no longer has the necessary access to do its job > > > when you are unable to use the local unix socket for its connection to > the > > > database. > > > > > > AFAICT there is no northd-specifc or admin type role available, have I > > > missed something? > > > > > > I have worked around the issue by enabling a separate listener on a > > > different port on the Southbound ovsdb-servers so that `ovn-northd` can > > > connect to that. > > > > > > > > > I have a OVN deployment with central components spread across three > > > machines, there is an instance of the Northbound and Southbound > > > `ovsdb-server` on each of them which are clustered, and there is also > an > > > instance of `ovn-northd` on each of them. > > > > > > The deployment is TLS-enabled and I have enabled RBAC. > > > > > > Since the DBs are clustered I have no control of which machine will be > the > > > leader, and it may be that one machine has the leader for the > Northbound DB > > > and a different machine has the leader of the Southbound DB. > > > > > > Because of this ovn-northd is unable to talk to the databases through a > > > local unix socket and must use a TLS-enabled connection to the DBs, and > > > herein lies the problem. > > > > > > > > > I peeked at the RBAC implementation, and it appears to me that the > > > permission system is tied to having specific columns in each table that > > > maps to the name of the client that wants permission. On the surface > this > > > appears to not fit with `ovn-northd`'s needs as I would think it would > need > > > full access to all tables perhaps based on a centrally managed set of > > > hostnames. > > > > > > -- > > > Frode Nordahl > > > > > > ___ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > ___ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN RBAC role for ovn-northd?
Hi: It is a known fact and have-been discussed before. We use the same workaround as you mentioned. Alternatively, you can also set role="" and it will work for both northd and ovn-controller instead of separate listeners which is also a security loop-hole. In short, some work is needed here to handle rbac for northd. On Thu, Nov 7, 2019 at 9:47 AM Frode Nordahl wrote: > Hello all, > > TL;DR; When enabling the `ovn-controller` role on the SB DB `ovsdb-server` > listener, `ovn-northd` no longer has the necessary access to do its job > when you are unable to use the local unix socket for its connection to the > database. > > AFAICT there is no northd-specifc or admin type role available, have I > missed something? > > I have worked around the issue by enabling a separate listener on a > different port on the Southbound ovsdb-servers so that `ovn-northd` can > connect to that. > > > I have a OVN deployment with central components spread across three > machines, there is an instance of the Northbound and Southbound > `ovsdb-server` on each of them which are clustered, and there is also an > instance of `ovn-northd` on each of them. > > The deployment is TLS-enabled and I have enabled RBAC. > > Since the DBs are clustered I have no control of which machine will be the > leader, and it may be that one machine has the leader for the Northbound DB > and a different machine has the leader of the Southbound DB. > > Because of this ovn-northd is unable to talk to the databases through a > local unix socket and must use a TLS-enabled connection to the DBs, and > herein lies the problem. > > > I peeked at the RBAC implementation, and it appears to me that the > permission system is tied to having specific columns in each table that > maps to the name of the client that wants permission. On the surface this > appears to not fit with `ovn-northd`'s needs as I would think it would need > full access to all tables perhaps based on a centrally managed set of > hostnames. > > -- > Frode Nordahl > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Running OVS on a Container
Hi: Also wanted to point out that steps for building/running ovs as a container are also mentioned in ovs installation doc https://raw.githubusercontent.com/openvswitch/ovs/1ca0323e7c29dc7ef5a615c265df0460208f92de/Documentation/intro/install/general.rst. OVS docker scripts are in https://github.com/openvswitch/ovs/tree/master/utilities/docker On Wed, Oct 9, 2019 at 12:24 PM Shivaram Mysore wrote: > Thanks for the information. I did not know about this. I had a chance to > quickly review the links provided and [1]. I could not get a good > understanding of how this would work *without* a Openstack environment. > In my work, I focussed on how we could make OVS on containers work without > depending on other software - ex. running on a basic Container OS like > CoreOS. I like the fact that you have DPDK support and am curious to > better understand the same. > > [1] > https://docs.openstack.org/ocata/networking-guide/deploy-ovs-selfservice.html > > > /Shivaram > > On Wed, Oct 9, 2019 at 11:56 AM MEHAN, MUNISH wrote: > >> You can run even OVS-DPDK as container. Here are the build and install >> details. >> https://review.opendev.org/#/q/topic:ovsdpdk++status:merged >> >> >> On 10/9/19, 2:42 PM, "ovs-discuss-boun...@openvswitch.org on behalf of >> Ben Pfaff" >> wrote: >> >> On Tue, Oct 08, 2019 at 07:35:09PM -0700, Shivaram Mysore wrote: >> > If you want to run OVS on a container, you can now: >> > >> > $ docker pull shivarammysore/ovs >> > >> > Source: >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_servicefractal_ovs=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=u9P90lo4XUsaajIaaChWiNO9h2zRJt6ULAZ8B-S_H5g= >> > >> > Don't forget to check out the docs directory in the repo where I >> have a few >> > more details. >> >> Someone said on Twitter, when I posted about this, that the OpenStack >> Kolla project also runs OVS in a container: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_MichalNasiadka_status_1181807956125474817-3Fs-3D20=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=PKXAb4I9Ln2vqkXAd5QxLewNuCeAyU55zyje3BNaZVY= >> This was the first I've heard of that and I wonder whether you've had >> a >> chance to look at their implementation? >> ___ >> discuss mailing list >> disc...@openvswitch.org >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddiscuss=DwICAg=LFYZ-o9_HUMeMTSQicvjIg=N738y1IimAWlyruQInDOeXAycuJ8SprO4UwKwlj95So=BV5H7MLgsyG6dggZ3sHCCO3YrIHpl81BQ1C4T2DC28M=VAp-PAekbY3wbZJTpmA8o2CAuKl0VKdPKz3-CRYlKpg= >> >> >> ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [ovs-dev] Hypervisor down during upgrade OVS 2.10.x to 2.10.y
Hi: Adding correct ovs-discuss ML. I did get a chance to take a look on it a bit. I think this is the bug in 4.4.0-104-generic kernel on ubuntu 16.04 as its being discussed on ubuntu forum https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407 where it can be hit all of a sudden as per the kernel logs shared "unregister_netdevice: waiting for br0 to become free. Usage count = 1". Folks are proposing on the forum to upgrade to higher kernel to get rid of this issue. Upstream linux proposed relevant fixes @ https://github.com/torvalds/linux/commit/ee60ad219f5c7c4fb2f047f88037770063ef785f to address related issues. I guess kernel folks can comment on this more. Not sure if I missed anything else. May be we can we can do some improvements in force-reload-kmod where right now stop_forwarding causes stop ovs-vswitchd but system stalls because br0 (eth0 is added to br0) is busy causing network connectivity loss. Host recovers only after host restart in current case. Not sure, if we need to handle this corner case in ovs? On Wed, Aug 28, 2019 at 2:21 PM Jin, Liang via dev wrote: > > Hi, > We upgrade the OVS recently from one version 2.10 to another version > 2.10. on some HV upgrade, the HV is down when running force reload kernel. > In the ovs-ctl log, kill ovs-vswitch is failed, but the script is still > going to reload the modules. > ``` > ovsdb-server is running with pid 2431 > ovs-vswitchd is running with pid 2507 > Thu Aug 22 23:13:49 UTC 2019:stop > 2019-08-22T23:13:50Z|1|fatal_signal|WARN|terminating with signal 14 > (Alarm clock) > Alarm clock > 2019-08-22T23:13:51Z|1|fatal_signal|WARN|terminating with signal 14 > (Alarm clock) > Alarm clock > * Exiting ovs-vswitchd (2507) > * Killing ovs-vswitchd (2507) > * Killing ovs-vswitchd (2507) with SIGKILL > * Killing ovs-vswitchd (2507) failed > * Exiting ovsdb-server (2431) > Thu Aug 22 23:14:58 UTC 2019:load-kmod > Thu Aug 22 23:14:58 UTC 2019:start --system-id=random --no-full-hostname > /usr/share/openvswitch/scripts/ovs-ctl: unknown option > "--no-full-hostname" (use --help for help) > * Starting ovsdb-server > * Configuring Open vSwitch system IDs > * ovs-vswitchd is already running > * Enabling remote OVSDB managers > ovsdb-server is running with pid 3860447 > ovs-vswitchd is running with pid 2507 > ovsdb-server is running with pid 3860447 > ovs-vswitchd is running with pid 2507 > Thu Aug 22 23:15:09 UTC 2019:load-kmod > Thu Aug 22 23:15:09 UTC 2019:force-reload-kmod --system-id=random > --no-full-hostname > /usr/share/openvswitch/scripts/ovs-ctl: unknown option > "--no-full-hostname" (use --help for help) > * Detected internal interfaces: br-int > Thu Aug 22 23:37:08 UTC 2019:stop > 2019-08-22T23:37:09Z|1|fatal_signal|WARN|terminating with signal 14 > (Alarm clock) > Alarm clock > 2019-08-22T23:37:10Z|1|fatal_signal|WARN|terminating with signal 14 > (Alarm clock) > Alarm clock > * Exiting ovs-vswitchd (2507) > * Killing ovs-vswitchd (2507) > * Killing ovs-vswitchd (2507) with SIGKILL > * Killing ovs-vswitchd (2507) failed > * Exiting ovsdb-server (3860447) > Thu Aug 22 23:40:42 UTC 2019:load-kmod > * Inserting openvswitch module > Thu Aug 22 23:40:42 UTC 2019:start --system-id=random --no-full-hostname > /usr/share/openvswitch/scripts/ovs-ctl: unknown option > "--no-full-hostname" (use --help for help) > * Starting ovsdb-server > * Configuring Open vSwitch system IDs > * Starting ovs-vswitchd > * Enabling remote OVSDB managers > ovsdb-server is running with pid 2399 > ovs-vswitchd is running with pid 2440 > ovsdb-server is running with pid 2399 > ovs-vswitchd is running with pid 2440 > Thu Aug 22 23:46:18 UTC 2019:load-kmod > Thu Aug 22 23:46:18 UTC 2019:force-reload-kmod --system-id=random > --no-full-hostname > /usr/share/openvswitch/scripts/ovs-ctl: unknown option > "--no-full-hostname" (use --help for help) > * Detected internal interfaces: br-int br0 > * Saving flows > * Exiting ovsdb-server (2399) > * Starting ovsdb-server > * Configuring Open vSwitch system IDs > * Flush old conntrack entries > * Exiting ovs-vswitchd (2440) > * Saving interface configuration > * Removing datapath: system@ovs-system > * Removing openvswitch module > rmmod: ERROR: Module vxlan is in use by: i40e > * Forcing removal of vxlan module > * Inserting openvswitch module > * Starting ovs-vswitchd > * Restoring saved flows > * Enabling remote OVSDB managers > * Restoring interface configuration > ``` > > But in kern.log, we see the log as below, the process could not exit > because waiting br0 release, and then, the ovs-ctl try to `kill term` and > `kill -9` the process, it does not work, because kernel is in infinity > loop. Then, ovs-ctl try to save the flows, when save flow, core dump > happened in kernel. Then HV is down until restart it. > ``` > Aug 22 16:13:45 slx11c-9gjm kernel: [21177057.998961] device br0 left > promiscuous mode > Aug 22 16:13:55 slx11c-9gjm kernel: [21177068.044859] > unregister_netdevice: waiting for br0 to become
Re: [ovs-discuss] OpenVswitch
Not sure what steps you used to compile and install 2.11. Use `export OVS_RUNDIR="/var/run/openvswitch" and then try vsctl commands. On Wed, Aug 21, 2019 at 2:43 AM V Sai Surya Laxman Rao Bellala < laxmanraobell...@gmail.com> wrote: > Hello all, > > Can anyone help me in solving this Bug? > I installed OVS-2.11 latest version and when i am adding the bridge to the > openvswitch.I am getting the below error. > > *ovs-vsctl: unix://var/run/openvswitch/db.sock: database connection failed > (No such file or directory* > > Please help me in solving this problem > > Regards > Laxman > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] ovn-controller Incremental Processing scale testing
Hi: As per irc meeting discussion, some nice findings were already discussed by Numan (Thanks for sharing the details). When changing external_ids for a claimed port e.g. ovn-nbctl set logical_switch_port sw0-port1 external_ids:foo=bar triggers re-computation on local compute. I do see the same behavior. Numan is proposing a patch to skip computation for external_ids column for an already claimed port for port_binding table because of runtime_data, can't handle change for input SB_port_binding, fall back to recompute ( https://github.com/openvswitch/ovs/blob/master/ovn/lib/inc-proc-eng.h#L77). However, I don't see external_ids in port_binding table for the port being set explicitly when setting Interface table in the test code that Daniel posted [1] which could trigger extra re-computation in current test scenario. Also ovs-vsctl add-br test will also trigger re-computation on local compute and yes I can see the same. Since we don't have any handlers for Ports and Interfaces table similar to port_binding and other handlers @ https://github.com/openvswitch/ovs/blob/master/ovn/controller/ovn-controller.c#L1769, adding a new bridge also causes re-computation on the local compute. Not sure if its required immediately because as per the patch shared by Daniel [1], I don't see any new test bridges getting created apart from br-int and hence wont be much impact. Or may be I missed to see if they are also creating test bridges during testing. Of course, any new ovs-vsctl command for attaching/detaching vif will sure trigger recompute on br-int as and when VIF(vm) gets added/deleted to program the flow on local compute. I didn't get a chance to verify when a chassisredirect port is claimed on a gateway chassis, it triggers computation on all computes registered with SB as per code https://github.com/openvswitch/ovs/blob/master/ovn/controller/binding.c#L722 which was also raises further optimization for chassisredirect flow that Numan is suggesting. 1. https://github.com/danalsan/browbeat/commit/0ff72da52ddf17aa9f7269f191eebd890899bdad On Fri, Jun 21, 2019 at 12:32 AM Han Zhou wrote: > > > On Thu, Jun 20, 2019 at 11:42 PM Numan Siddique > wrote: > > > > > > > > On Fri, Jun 21, 2019, 11:47 AM Han Zhou wrote: > >> > >> > >> > >> On Tue, Jun 11, 2019 at 9:16 AM Daniel Alvarez Sanchez < > dalva...@redhat.com> wrote: > >> > > >> > Thanks a lot Han for the answer! > >> > > >> > On Tue, Jun 11, 2019 at 5:57 PM Han Zhou wrote: > >> > > > >> > > > >> > > > >> > > > >> > > On Tue, Jun 11, 2019 at 5:12 AM Dumitru Ceara > wrote: > >> > > > > >> > > > On Tue, Jun 11, 2019 at 10:40 AM Daniel Alvarez Sanchez > >> > > > wrote: > >> > > > > > >> > > > > Hi Han, all, > >> > > > > > >> > > > > Lucas, Numan and I have been doing some 'scale' testing of > OpenStack > >> > > > > using OVN and wanted to present some results and issues that > we've > >> > > > > found with the Incremental Processing feature in > ovn-controller. Below > >> > > > > is the scenario that we executed: > >> > > > > > >> > > > > * 7 baremetal nodes setup: 3 controllers (running > >> > > > > ovn-northd/ovsdb-servers in A/P with pacemaker) + 4 compute > nodes. OVS > >> > > > > 2.10. > >> > > > > * The test consists on: > >> > > > > - Create openstack network (OVN LS), subnet and router > >> > > > > - Attach subnet to the router and set gw to the external > network > >> > > > > - Create an OpenStack port and apply a Security Group (ACLs > to allow > >> > > > > UDP, SSH and ICMP). > >> > > > > - Bind the port to one of the 4 compute nodes (randomly) by > >> > > > > attaching it to a network namespace. > >> > > > > - Wait for the port to be ACTIVE in Neutron ('up == True' in > NB) > >> > > > > - Wait until the test can ping the port > >> > > > > * Running browbeat/rally with 16 simultaneous process to > execute the > >> > > > > test above 150 times. > >> > > > > * When all the 150 'fake VMs' are created, browbeat will delete > all > >> > > > > the OpenStack/OVN resources. > >> > > > > > >> > > > > We first tried with OVS/OVN 2.10 and pulled some results which > showed > >> > > > > 100% success but ovn-controller is quite loaded (as expected) > in all > >> > > > > the nodes especially during the deletion phase: > >> > > > > > >> > > > > - Compute node: https://imgur.com/a/tzxfrIR > >> > > > > - Controller node (ovn-northd and ovsdb-servers): > https://imgur.com/a/8ffKKYF > >> > > > > > >> > > > > After conducting the tests above, we replaced ovn-controller in > all 7 > >> > > > > nodes by the one with the current master branch (actually from > last > >> > > > > week). We also replaced ovn-northd and ovsdb-servers but the > >> > > > > ovs-vswitchd has been left untouched (still on 2.10). The > expected > >> > > > > results were to get less ovn-controller CPU usage and also > better > >> > > > > times due to the Incremental Processing feature introduced > recently. > >> > > > > However, the results don't look very good: > >> > > > > > >> > > > >
Re: [ovs-discuss] Raft issues while removing a node
Hi Ben: I cannot see the patch series on the patchwork. Is it due to mail server sync issue or something else? Not sure if its appropriate to try out https://github.com/blp/ovs-reviews/commits/raft-fixes since it has the patches in review in addition to some other patches? Regards, On Thu, Nov 15, 2018 at 9:18 AM Ben Pfaff wrote: > On Thu, Nov 08, 2018 at 04:17:03PM -0800, ramteja tadishetti wrote: > > I am facing trouble in graceful removal of node in a 3 Node RAFT setup. > > Thanks for the report. I followed up on it and found a number of bugs > in the implementation of the "kick" request. There is a patch series > out that fixes all of the bugs that I identified: > > https://patchwork.ozlabs.org/project/openvswitch/list/?series=76115 > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode
Cool! Thanks a lot. On Mon, Sep 10, 2018 at 12:57 AM Numan Siddique wrote: > > > On Sun, Sep 9, 2018 at 8:38 AM aginwala wrote: > >> Hi: >> >> As consented with approach 1, I tested it. DB data is retained even for >> the continuous fail-over scenario where all 3 nodes are started/stopped at >> the same time multiple times in a loop. Also, works as expected in the >> normal failover scenarios. >> >> Since you also asked to test failing process_notification, I did >> introduce 10 sec sleep after line >> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L604 >> which actually resulted in pacemaker failure with unknown error for 2 slave >> nodes but the function did not report any error messages that I was >> logging. DB data was still intact since it always promoted the 3rd node as >> master. >> >> >> Output for above failure test: >> Online: [ test-pace1-2365293 test-pace2-2365308 test-pace3-2598581 ] >> >> Full list of resources: >> >> Master/Slave Set: ovndb_servers-master [ovndb_servers] >> ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace3-2598581 >> (unmanaged) >> ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace2-2365308 >> (unmanaged) >> Masters: [ test-pace1-2365293 ] >> >> Failed Actions: >> * ovndb_servers_stop_0 on test-pace3-2598581 'unknown error' (1): >> call=12, status=Timed Out, exitreason='none', >> last-rc-change='Sat Sep 8 19:22:20 2018', queued=0ms, exec=20003ms >> * ovndb_servers_stop_0 on test-pace2-2365308 'unknown error' (1): >> call=12, status=Timed Out, exitreason='none', >> last-rc-change='Sat Sep 8 19:22:20 2018', queued=0ms, exec=20002ms >> >> >> Another way I tried to intentionally set error to some non-null string >> that skipped calling process_notification which does wipes out whole db >> when that node is promoted because of no notification updates. Was this the >> approach you wanted to test or some other way (correct me if I am wrong)? >> >> Also wanted to say, if you can add a info log statement in the formal >> patch during reset_database function as I used the same in my env which >> makes clear from log too about the failover behavior. >> > > Thanks for testing it out. I sent a formal patch here adding the log > message as suggested by you - https://patchwork.ozlabs.org/patch/967888/ > > Regards > Numan > > >> >> As you guys mentioned, not sure what other corner case might have been >> missed but this patch LGTM overall (safer than the current code that wipes >> out the db :)) >> >> Regards, >> >> On Wed, Sep 5, 2018 at 1:24 PM Han Zhou wrote: >> >>> >>> >>> On Wed, Sep 5, 2018 at 10:44 AM aginwala wrote: >>> > >>> > Thanks Numan: >>> > >>> > I will give it shot and update the findings. >>> > >>> > >>> > On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique >>> wrote: >>> >> >>> >> >>> >> >>> >> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou wrote: >>> >>> >>> >>> >>> >>> >>> >>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique >>> wrote: >>> >>> > >>> >>> > >>> >>> > >>> >>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff wrote: >>> >>> >> >>> >>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote: >>> >>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala >>> wrote: >>> >>> >> > > >>> >>> >> > > >>> >>> >> > > To add on , we are using LB VIP IP and no constraint with 3 >>> nodes as Han >>> >>> >> > mentioned earlier where active node have syncs from invalid IP >>> and rest >>> >>> >> > two nodes sync from LB VIP IP. Also, I was able to get some >>> logs from one >>> >>> >> > node that triggered: >>> >>> >> > >>> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 >>> >>> >> > > >>> >>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp: >>> 10.189.208.16:50686: >>> >>> >> > entering RECONNECT >>> >>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_
Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode
Hi: As consented with approach 1, I tested it. DB data is retained even for the continuous fail-over scenario where all 3 nodes are started/stopped at the same time multiple times in a loop. Also, works as expected in the normal failover scenarios. Since you also asked to test failing process_notification, I did introduce 10 sec sleep after line https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L604 which actually resulted in pacemaker failure with unknown error for 2 slave nodes but the function did not report any error messages that I was logging. DB data was still intact since it always promoted the 3rd node as master. Output for above failure test: Online: [ test-pace1-2365293 test-pace2-2365308 test-pace3-2598581 ] Full list of resources: Master/Slave Set: ovndb_servers-master [ovndb_servers] ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace3-2598581 (unmanaged) ovndb_servers (ocf::ovn:ovndb-servers): FAILED test-pace2-2365308 (unmanaged) Masters: [ test-pace1-2365293 ] Failed Actions: * ovndb_servers_stop_0 on test-pace3-2598581 'unknown error' (1): call=12, status=Timed Out, exitreason='none', last-rc-change='Sat Sep 8 19:22:20 2018', queued=0ms, exec=20003ms * ovndb_servers_stop_0 on test-pace2-2365308 'unknown error' (1): call=12, status=Timed Out, exitreason='none', last-rc-change='Sat Sep 8 19:22:20 2018', queued=0ms, exec=20002ms Another way I tried to intentionally set error to some non-null string that skipped calling process_notification which does wipes out whole db when that node is promoted because of no notification updates. Was this the approach you wanted to test or some other way (correct me if I am wrong)? Also wanted to say, if you can add a info log statement in the formal patch during reset_database function as I used the same in my env which makes clear from log too about the failover behavior. As you guys mentioned, not sure what other corner case might have been missed but this patch LGTM overall (safer than the current code that wipes out the db :)) Regards, On Wed, Sep 5, 2018 at 1:24 PM Han Zhou wrote: > > > On Wed, Sep 5, 2018 at 10:44 AM aginwala wrote: > > > > Thanks Numan: > > > > I will give it shot and update the findings. > > > > > > On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique > wrote: > >> > >> > >> > >> On Wed, Sep 5, 2018 at 12:42 AM Han Zhou wrote: > >>> > >>> > >>> > >>> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique > wrote: > >>> > > >>> > > >>> > > >>> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff wrote: > >>> >> > >>> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote: > >>> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala > wrote: > >>> >> > > > >>> >> > > > >>> >> > > To add on , we are using LB VIP IP and no constraint with 3 > nodes as Han > >>> >> > mentioned earlier where active node have syncs from invalid IP > and rest > >>> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs > from one > >>> >> > node that triggered: > >>> >> > > https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 > >>> >> > > > >>> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp: > 10.189.208.16:50686: > >>> >> > entering RECONNECT > >>> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp: > >>> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound > database due to > >>> >> > server termination) > >>> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp: > >>> >> > 10.189.208.21:56160: disconnecting (removing _Server database > due to server > >>> >> > termination) > >>> >> > > 20 > >>> >> > > > >>> >> > > I am not sure if sync_from on active node too via some invalid > ip is > >>> >> > causing some flaw when all are down during the race condition in > this > >>> >> > corner case. > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique < > nusid...@redhat.com> wrote: > >>> >> > >> > >>> >> > >> >
Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode
Thanks Numan: I will give it shot and update the findings. On Wed, Sep 5, 2018 at 5:35 AM Numan Siddique wrote: > > > On Wed, Sep 5, 2018 at 12:42 AM Han Zhou wrote: > >> >> >> On Sun, Sep 2, 2018 at 11:01 PM Numan Siddique >> wrote: >> > >> > >> > >> > On Fri, Aug 10, 2018 at 3:59 AM Ben Pfaff wrote: >> >> >> >> On Thu, Aug 09, 2018 at 09:32:21AM -0700, Han Zhou wrote: >> >> > On Thu, Aug 9, 2018 at 1:57 AM, aginwala wrote: >> >> > > >> >> > > >> >> > > To add on , we are using LB VIP IP and no constraint with 3 nodes >> as Han >> >> > mentioned earlier where active node have syncs from invalid IP and >> rest >> >> > two nodes sync from LB VIP IP. Also, I was able to get some logs >> from one >> >> > node that triggered: >> >> > >> https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 >> >> > > >> >> > > 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp: >> 10.189.208.16:50686: >> >> > entering RECONNECT >> >> > > 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp: >> >> > 10.189.208.16:50686: disconnecting (removing OVN_Northbound >> database due to >> >> > server termination) >> >> > > 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp: >> >> > 10.189.208.21:56160: disconnecting (removing _Server database due >> to server >> >> > termination) >> >> > > 20 >> >> > > >> >> > > I am not sure if sync_from on active node too via some invalid ip >> is >> >> > causing some flaw when all are down during the race condition in this >> >> > corner case. >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique >> wrote: >> >> > >> >> >> > >> >> >> > >> >> >> > >> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff wrote: >> >> > >>> >> >> > >>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote: >> >> > >>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff >> wrote: >> >> > >>> > > >> >> > >>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote: >> >> > >>> > > > Hi, >> >> > >>> > > > >> >> > >>> > > > We found an issue in our testing (thanks aginwala) with >> >> > active-backup >> >> > >>> > mode >> >> > >>> > > > in OVN setup. >> >> > >>> > > > In the 3 node setup with pacemaker, after stopping >> pacemaker on >> >> > all >> >> > >>> > three >> >> > >>> > > > nodes (simulate a complete shutdown), and then if starting >> all of >> >> > them >> >> > >>> > > > simultaneously, there is a good chance that the whole DB >> content >> >> > gets >> >> > >>> > lost. >> >> > >>> > > > >> >> > >>> > > > After studying the replication code, it seems there is a >> phase >> >> > that the >> >> > >>> > > > backup node deletes all its data and wait for data to be >> synced >> >> > from the >> >> > >>> > > > active node: >> >> > >>> > > > >> >> > >> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306 >> >> > >>> > > > >> >> > >>> > > > At this state, if the node was set to active, then all >> data is >> >> > gone for >> >> > >>> > the >> >> > >>> > > > whole cluster. This can happen in different situations. In >> the >> >> > test >> >> > >>> > > > scenario mentioned above it is very likely to happen, since >> >> > pacemaker >> >> > >>> > just >> >> > >>> > > > randomly select one as master, not knowing
Re: [ovs-discuss] Possible data loss of OVSDB active-backup mode
To add on , we are using LB VIP IP and no constraint with 3 nodes as Han mentioned earlier where active node have syncs from invalid IP and rest two nodes sync from LB VIP IP. Also, I was able to get some logs from one node that triggered: https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:10.189.208.16:50686: entering RECONNECT 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp: 10.189.208.16:50686: disconnecting (removing OVN_Northbound database due to server termination) 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp: 10.189.208.21:56160: disconnecting (removing _Server database due to server termination) 20 I am not sure if sync_from on active node too via some invalid ip is causing some flaw when all are down during the race condition in this corner case. On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique wrote: > > > On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff wrote: > >> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote: >> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff wrote: >> > > >> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote: >> > > > Hi, >> > > > >> > > > We found an issue in our testing (thanks aginwala) with >> active-backup >> > mode >> > > > in OVN setup. >> > > > In the 3 node setup with pacemaker, after stopping pacemaker on all >> > three >> > > > nodes (simulate a complete shutdown), and then if starting all of >> them >> > > > simultaneously, there is a good chance that the whole DB content >> gets >> > lost. >> > > > >> > > > After studying the replication code, it seems there is a phase that >> the >> > > > backup node deletes all its data and wait for data to be synced >> from the >> > > > active node: >> > > > >> https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306 >> > > > >> > > > At this state, if the node was set to active, then all data is gone >> for >> > the >> > > > whole cluster. This can happen in different situations. In the test >> > > > scenario mentioned above it is very likely to happen, since >> pacemaker >> > just >> > > > randomly select one as master, not knowing the internal sync state >> of >> > each >> > > > node. It could also happen when failover happens right after a new >> > backup >> > > > is started, although less likely in real environment, so starting up >> > node >> > > > one by one may largely reduce the probability. >> > > > >> > > > Does this analysis make sense? We will do more tests to verify the >> > > > conclusion, but would like to share with community for discussions >> and >> > > > suggestions. Once this happens it is very critical - even more >> serious >> > than >> > > > just no HA. Without HA it is just control plane outage, but this >> would >> > be >> > > > data plane outage because OVS flows will be removed accordingly >> since >> > the >> > > > data is considered as deleted from ovn-controller point of view. >> > > > >> > > > We understand that active-standby is not the ideal HA mechanism and >> > > > clustering is the future, and we are also testing the clustering >> with >> > the >> > > > latest patch. But it would be good if this problem can be addressed >> with >> > > > some quick fix, such as keep a copy of the old data somewhere until >> the >> > > > first sync finishes? >> > > >> > > This does seem like a plausible bug, and at first glance I believe >> that >> > > you're correct about the race here. I guess that the correct behavior >> > > must be to keep the original data until a new copy of the data has >> been >> > > received, and only then atomically replace the original by the new. >> > > >> > > Is this something you have time and ability to fix? >> > >> > Thanks Ben for quick response. I guess I will not have time until I send >> > out next series for incremental processing :) >> > It would be good if someone can help and then please reply this email if >> > he/she starts working on it so that we will not end up with overlapping >> > work. >> > > I will give a shot at fixing this i
Re: [ovs-discuss] ovs-appctl to monitor HVs sb connection status
Thanks Ben and Han for the suggestions and clarification: So will stick around with ovs-appctl -t ovn-controller rconn/show for individual HVs considering current scope. For checking all HVs connection stats from central node, we can pick it up as a new feature going further. On Mon, Jul 9, 2018 at 8:30 PM Ben Pfaff wrote: > On Mon, Jul 09, 2018 at 06:12:11PM -0700, Han Zhou wrote: > > On Mon, Jul 9, 2018 at 3:37 PM, Ben Pfaff wrote: > > > > > > On Sun, Jul 08, 2018 at 01:09:12PM -0700, aginwala wrote: > > > > As per discussions in past OVN meetings regarding ovn monitoring > stand > > > > point, need some clarity from design perspective. I am thinking of > below > > > > approaches: > > > > > > > > 1. Can we implement something like ovs-appctl -t > > chassis-conn/list > > > > that will show all HVs stats (connected/non-connected)? > > > > > > You're interested particularly in which chassis are connected to the > ovn > > > southbound database? The db server only knows who is connected to it > if > > > they provide SSL certificates. It might not be too hard to get it to > > > report the common name (CN) of the SSL certificates for the clients > > > connected to it. Would that suffice? > > > > > > > 2. or on individual HVs using ovs-appctl -t ovn-controller > > > > chassis-conn/list ? > > > > > > The HVs definitely don't know who is connected to the sbdb server. > > > ___ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > Discussion in the meeting was about showing connection status to SB-DB > on a > > given HV, and for that HV only. > > So I think a new command like "ovs-appctl -t ovn-controller rconn/show" > > should be enough. > > I forgot the context. If that's enough, it will work. > > We don't currently have a good way to do general-purpose monitoring or > configuration of ovn-controller. What little we do now, we do through > the ovs-vswitchd database. If we want something more extensive, maybe > it should have its own hypervisor-local database. > > > The suggestion from Ben is a good one if we are trying to check all HV > > connections status from central node point of view. > > I don't think it should be very hard (but sometimes OpenSSL makes easy > things difficult). > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] ovs-appctl to monitor HVs sb connection status
Hi: As per discussions in past OVN meetings regarding ovn monitoring stand point, need some clarity from design perspective. I am thinking of below approaches: 1. Can we implement something like ovs-appctl -t chassis-conn/list that will show all HVs stats (connected/non-connected)? 2. or on individual HVs using ovs-appctl -t ovn-controller chassis-conn/list ? For now, we can manually verify with couple of ways on individual HV using netstat, controller logs, etc. However, it makes sense to have some feature to quickly get a glimpse of whats connected and what not in a large env setup having many HV counts instead of checking individual chassis. Hence, need some agreement/suggestions before starting the implementation along with some other alternatives if in mind. Regards, ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Question to OVN DB pacemaker script
On Fri, May 11, 2018 at 5:21 PM, Han Zhou <zhou...@gmail.com> wrote: > Thanks for the output. It appears to be more complex than I thought > before. It is good that the new slave doesn't listen on 6641, although I am > not sure how is it achieved. I guess a stop has been triggered > instead of simply demote, but I need to spend some time on the pacemaker > state machine. And please ignore my comment about calling > ovsdb_server_start() in demote - it would cause recursive call since > ovsdb_server_start() calls demote(), too. > > Regarding the change: > if [ "x${present_master}" = x ]; then > +set $@ --db-nb-create-insecure-remote=yes > +set $@ --db-sb-create-insecure-remote=yes > # No master detected, or the previous master is not among the > # set starting. > > >>> Sure. Makes sense. Thanks for review. > This "if" branch is when there is no master present, but in fact we want > it to be set when current node is master. So this change doesn't affect > anything. It is the below change that made the test work (so that on slave > node the tcp port is not opened): > elif [ ${present_master} != ${host_name} ]; then > +set $@ --db-nb-create-insecure-remote=no > +set $@ --db-sb-create-insecure-remote=no > > The error log of ovsdb should not be skipped. We should never bind the LB > VIP on the ovsdb socket because it is not on the host. I think it is > related to the code in ovsdb_server_notify(): > ovn-nbctl -- --id=@conn_uuid create Connection \ > target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ > inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid > > >>>> Thanks for the pointer, I am able to fix the socket util error by skipping target both for nb and sb db LB use case. Also, it is getting stamped if we use vritual IPaddr2 heartbeart resource of OCF too with existing feature using L2 VIP IP under same subnet. Hence, it needs to be skipped for both cases any ways. May be do we need to handle that in same commit or in different one? +if [ "x${LISTEN_ON_MASTER_IP_ONLY}" = xyes ]; then +ovn-nbctl -- --id=@conn_uuid create Connection \ +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid +else > When using LB, we should set 0.0.0.0 here. > > Also, the failed action is a concern. We may dig more on the root cause. > Thanks for finding these issues. > >>>> For crm move I am able to now see actual error which resulted in failed action where move is triggering self replication for a bit. *2018-05-12T20:08:21.687Z|00021|ovsdb_error|ERR|unexpected ovsdb error: Server ID check failed: Self replicating is not allowed* However, functionality is intact.May be due to some race condition via pacemaker state machine as you pointed out where in crm resource move use case needs to be handled explicitly? However, reboot node/ service pacemaker/corosync restart, etc. do not result in self replicating issues while promoting the new node. Will also try to see if I can find something more. > > Thanks, > Han > > > On Fri, May 11, 2018 at 3:29 PM, aginwala <aginw...@asu.edu> wrote: > >> Sure: >> >> *VIP_ip* = 10.149.4.252 >> *LB IP* = 10.149.0.40 >> *slave netstat where it syncs from master LB VIP IP * >> #netstat -an | grep 6641 >> tcp0 0 10.169.129.34:47426 10.149.4.252:6641 >> ESTABLISHED >> tcp0 0 10.169.129.34:47444 10.149.4.252:6641 >> ESTABLISHED >> >> *Slave OVS:, * >> # ps aux |grep ovsdb-server >> root 7388 0.0 0.0 18048 376 ?Ss 14:08 0:00 >> ovsdb-server: monitoring pid 7389 (healthy) >> root 7389 0.0 0.0 18464 4556 ?S14:08 0:00 >> ovsdb-server -vconsole:off -vfile:info >> --log-file=/var/log/openvswitch/ovsdb-server-nb.log >> --remote=punix:/var/run/openvswitch/ovnnb_db.sock >> --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl >> --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections >> --private-key=db:OVN_Northbound,SSL,private_key >> --certificate=db:OVN_Northbound,SSL,certificate >> --ca-cert=db:OVN_Northbound,SSL,ca_cert >> --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols >> --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp: >> 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db >> root 7397 0.0 0.0 18048 372 ?Ss 14:08 0:00 >> ovsdb-server: monitoring pid 7398 (healthy) >> root 7398 0.0 0.0 18868 5280 ?S14:08 0:01 >> ovsdb-server -vconsole:off -vfile:info >> -
Re: [ovs-discuss] Question to OVN DB pacemaker script
Sure: *VIP_ip* = 10.149.4.252 *LB IP* = 10.149.0.40 *slave netstat where it syncs from master LB VIP IP * #netstat -an | grep 6641 tcp0 0 10.169.129.34:47426 10.149.4.252:6641 ESTABLISHED tcp0 0 10.169.129.34:47444 10.149.4.252:6641 ESTABLISHED *Slave OVS:, * # ps aux |grep ovsdb-server root 7388 0.0 0.0 18048 376 ?Ss 14:08 0:00 ovsdb-server: monitoring pid 7389 (healthy) root 7389 0.0 0.0 18464 4556 ?S14:08 0:00 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-nb.log --remote=punix:/var/run/openvswitch/ovnnb_db.sock --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --sync-from=tcp: 10.149.4.252:6641 /etc/openvswitch/ovnnb_db.db root 7397 0.0 0.0 18048 372 ?Ss 14:08 0:00 ovsdb-server: monitoring pid 7398 (healthy) root 7398 0.0 0.0 18868 5280 ?S14:08 0:01 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log --remote=punix:/var/run/openvswitch/ovnsb_db.sock --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --sync-from=tcp: 10.149.4.252:6642 /etc/openvswitch/ovnsb_db.db *Master netstat where connections is established with LB :* netstat -an | grep 6641 tcp0 0 0.0.0.0:66410.0.0.0:* LISTEN tcp0 0 10.169.129.33:6641 10.149.0.40:47426 ESTABLISHED tcp0 0 10.169.129.33:6641 10.149.0.40:47444 ESTABLISHED *Master OVS:* # ps aux | grep ovsdb-server root 3318 0.0 0.0 12940 1012 pts/0S+ 15:23 0:00 grep --color=auto ovsdb-server root 11648 0.0 0.0 18048 372 ?Ss 14:08 0:00 ovsdb-server: monitoring pid 11649 (healthy) root 11649 0.0 0.0 18312 4208 ?S14:08 0:01 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-nb.log --remote=punix:/var/run/openvswitch/ovnnb_db.sock --pidfile=/var/run/openvswitch/ovnnb_db.pid --unixctl=ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers --remote=ptcp:6641:0.0.0.0 --sync-from=tcp:192.0.2.254:6641 /etc/openvswitch/ovnnb_db.db root 11657 0.0 0.0 18048 376 ?Ss 14:08 0:00 ovsdb-server: monitoring pid 11658 (healthy) root 11658 0.0 0.0 19340 5552 ?S14:08 0:01 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log --remote=punix:/var/run/openvswitch/ovnsb_db.sock --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers --remote=ptcp:6642:0.0.0.0 --sync-from=tcp:192.0.2.254:6642 /etc/openvswitch/ovnsb_db.db Same is for 6642 for sb db. Hope it's clear. Sorry did not post in the previous message as I thought you already got the point :) . Regards, Aliasgar On Fri, May 11, 2018 at 3:16 PM, Han Zhou <zhou...@gmail.com> wrote: > Ali, could you share output of "ps | grep ovsdb" and "netstat -lpn | grep > 6641" on the new slave node after you do "crm resource move"? > > On Fri, May 11, 2018 at 2:25 PM, aginwala <aginw...@asu.edu> wrote: > >> Thanks Han for more suggestions: >> >> >> I did test failover by gracefully stopping pacemaker+corosync on master >> node along with crm move and it works as expected too as crm move is >> triggering promote of new master and hence the new master gets elected >> along with slave getting demoted as expected to listen on sync-from node. >> Hence, whatever code change I posted earlier is well and good. >> >> # crm stat >> Stack: corosync >> Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with >> quorum >> 2 nodes and 2 resources configured >&
Re: [ovs-discuss] Question to OVN DB pacemaker script
Thanks Han for more suggestions: I did test failover by gracefully stopping pacemaker+corosync on master node along with crm move and it works as expected too as crm move is triggering promote of new master and hence the new master gets elected along with slave getting demoted as expected to listen on sync-from node. Hence, whatever code change I posted earlier is well and good. # crm stat Stack: corosync Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with quorum 2 nodes and 2 resources configured Online: [ test-pace1-2365293 test-pace2-2365308 ] Full list of resources: Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ test-pace2-2365308 ] Slaves: [ test-pace1-2365293 ] #crm --debug resource move ovndb_servers test-pace1-2365293 DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (70404b0)] DEBUG: found pacemaker version: 1.1.14 DEBUG: invoke: crm_resource --quiet --move -r 'ovndb_servers' --node='test-pace1-2365293' # crm stat Stack: corosync Current DC: test-pace1-2365293 (version 1.1.14-70404b0) - partition with quorum 2 nodes and 2 resources configured Online: [ test-pace1-2365293 test-pace2-2365308 ] Full list of resources: Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ test-pace1-2365293 ] Slaves: [ test-pace2-2365308 ] Failed Actions: * ovndb_servers_monitor_1 on test-pace2-2365308 'master' (8): call=46, status=complete, exitreason='none', last-rc-change='Fri May 11 14:08:35 2018', queued=0ms, exec=83ms Note: Failed Actions warning only comes for crm move command and not using reboot/kill/service pacemaker/corosync stop/start I cleaned up the warning using below commad: #crm_resource -P Waiting for 1 replies from the CRMd. OK Also wanted to call out above findings noticed that ocf_attribute_target is not getting called as per pacemaker logs as code says it will not work for older pacemaker versions and not sure what versions exactly as I am on version 1.1.14 # pacemaker logs notice: operation_finished: ovndb_servers_monitor_1:7561:stderr [ /usr/lib/ocf/resource.d/ovn/ovndb-servers: line 31: ocf_attribute_target: command not found ] # Also need nb db logs are showing socket util errors which I think need a code change too to skip stamping it as functionality is still working as expected (may be in a separate commit since its ovsdb change) 018-05-11T21:14:25.958Z|00560|socket_util|ERR|6641:10.149.4.252: bind: Cannot assign requested address 2018-05-11T21:14:25.958Z|00561|socket_util|ERR|6641:10.149.4.252: bind: Cannot assign requested address 2018-05-11T21:14:27.859Z|00562|socket_util|ERR|6641:10.149.4.252: bind: Cannot assign requested address Let me know for any suggestions further. Regards, Aliasgar On Thu, May 10, 2018 at 3:49 PM, Han Zhou <zhou...@gmail.com> wrote: > Good progress! > > I think at least one more change is needed to ensure when demote happens, > the TCP port is shut down. Otherwise, the LB will be confused again and > can't figure out which one is active. This is the graceful failover > scenario which can be tested by crm resource move instead of reboot/killing > process. > > This may be done by the same approach you did for promote, i.e. stop ovsdb > and then call ovsdb_server_start() so the parameters are reset correctly > before starting. Alternatively we can add a command in ovsdb-server, in > addition to the commands that switches to/from active/backup modes, to > open/close the TCP ports, to avoid restarting during failover, but I am not > sure if this is valuable. It depends on whether restarting ovsdb-server > during failover is sufficient enough. Could you add the restart logic for > demote and try more? Thanks! > > Thanks, > Han > > On Thu, May 10, 2018 at 1:54 PM, aginwala <aginw...@asu.edu> wrote: > >> Hi : >> >> Just to further update, I am able to re-open tcp port for failover >> scenario when new master is getting promoted with additional code changes >> as below which do require stop of ovs service on the new selected master to >> reset the tcp settings: >> >> >> diff --git a/ovn/utilities/ovndb-servers.ocf >> b/ovn/utilities/ovndb-servers.ocf >> index 164b6bc..8cb4c25 100755 >> --- a/ovn/utilities/ovndb-servers.ocf >> +++ b/ovn/utilities/ovndb-servers.ocf >> @@ -295,8 +295,8 @@ ovsdb_server_start() { >> >> set ${OVN_CTL} >> >> -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT} >> -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT} >> +set $@ --db-nb-port=${NB_MASTER_PORT} >> +set $@ --db-sb-port=${SB_MASTER_PORT} >> >> if [ "x${NB_MASTER_PROTO}" = xtcp ]; then >> set $@ --db-nb-create-insecure-remote=yes >> @@ -307,6 +307,8 @@ ovsdb_server_start() {
Re: [ovs-discuss] Question to OVN DB pacemaker script
Hi : Just to further update, I am able to re-open tcp port for failover scenario when new master is getting promoted with additional code changes as below which do require stop of ovs service on the new selected master to reset the tcp settings: diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index 164b6bc..8cb4c25 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -295,8 +295,8 @@ ovsdb_server_start() { set ${OVN_CTL} -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT} -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT} +set $@ --db-nb-port=${NB_MASTER_PORT} +set $@ --db-sb-port=${SB_MASTER_PORT} if [ "x${NB_MASTER_PROTO}" = xtcp ]; then set $@ --db-nb-create-insecure-remote=yes @@ -307,6 +307,8 @@ ovsdb_server_start() { fi if [ "x${present_master}" = x ]; then +set $@ --db-nb-create-insecure-remote=yes +set $@ --db-sb-create-insecure-remote=yes # No master detected, or the previous master is not among the # set starting. # @@ -316,6 +318,8 @@ ovsdb_server_start() { set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS} --db-sb-sync-from-addr=${INVALID_IP_ADDRESS} elif [ ${present_master} != ${host_name} ]; then +set $@ --db-nb-create-insecure-remote=no +set $@ --db-sb-create-insecure-remote=no # An existing master is active, connect to it set $@ --db-nb-sync-from-addr=${MASTER_IP} --db-sb-sync-from-addr=${MASTER_IP} set $@ --db-nb-sync-from-port=${NB_MASTER_PORT} @@ -416,6 +420,8 @@ ovsdb_server_promote() { ;; esac +${OVN_CTL} stop_ovsdb +ovsdb_server_start ${OVN_CTL} promote_ovnnb ${OVN_CTL} promote_ovnsb Below are the scenarios tested: MasterSlaveScenarioResult - - reboot/failure New master gets promoted with tcp ports enabled to start taking LB traffic. - - reboot/failure No change and current master continues taking traffic with slave continue to sync from master. - - reboot/failure New master gets promoted with tcp ports enabled to start taking LB traffic. Also sync on slaves from master works as expected: # On master ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add 556 # on slave port is shutdown as expected ovn-nbctl --db=tcp:10.169.129.34:6641 show ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection refused) # on slave local unix socket, above lswitch 556 gets replicated too as --sync-from=tcp:10.149.4.252:6641 ovn-nbctl show switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) # Same testing for sb db too # Slave port 6642 is shutdown too ovn-sbctl --db=tcp:10.169.129.34:6642 show hangs and # Using master ip works ovn-sbctl --db=tcp:10.169.129.33:6642 show Chassis "21f12bd6-e9e8-4ee2-afeb-28b331df6715" hostname: "test-pace2-2365308.lvs02.dev.ebayc3.com" Encap geneve ip: "10.169.129.34" options: {csum="true"} # Accessing via LB vip works fine too as only one member is active: for i in `seq 1 500`; do ovn-sbctl --db=tcp:10.149.4.252:6642 show; done switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) switch 2bd07b67-fd6b-401d-9612-da75e8f9ffc8 (556) Everything works fine as expected. Let me know for any corner case missed. I will submit a formal patch using LISTEN_ON_MASTER_IP_ONLY for using LB with tcp to avoid breaking existing functionality accordingly. Regards, Aliasgar On Thu, May 10, 2018 at 9:55 AM, aginwala <aginw...@asu.edu> wrote: > Thanks folks for suggestions: > > For LB vip configurations, I did the testing further and yes it does > tries to hit the slave db as per the logs below and fails as slave do not > have write permission of which LB is not aware of: > for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add > $i590;done > ovn-nbctl: transaction error: {"details":"insert operation not allowed > when database server is in read only mode","error":"not allowed"} > ovn-nbctl: transaction error: {"details":"insert operation not allowed > when database server is in read only mode","error":"not allowed"} > ovn-nbctl: transaction error: {"details":"insert operation not allowed > when database server is in read only mode","error":"not allowed"} > > Hence, with little more code changes(in the same patch without the flag > variable suggestion), I am able to shutdown the tcp port on the slave and > it works fine as below: > #Master Node > # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444 > #Slave Node > # ovn-nbctl --db=t
Re: [ovs-discuss] Question to OVN DB pacemaker script
Thanks folks for suggestions: For LB vip configurations, I did the testing further and yes it does tries to hit the slave db as per the logs below and fails as slave do not have write permission of which LB is not aware of: for i in `seq 1 500`; do ovn-nbctl --db=tcp:10.149.4.252:6641 ls-add $i590;done ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"} Hence, with little more code changes(in the same patch without the flag variable suggestion), I am able to shutdown the tcp port on the slave and it works fine as below: #Master Node # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444 #Slave Node # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444 ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection refused) Code to shutdown tcp port on slave db along with only master listening on tcp ports: diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index 164b6bc..b265df6 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -295,8 +295,8 @@ ovsdb_server_start() { set ${OVN_CTL} -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT} -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT} +set $@ --db-nb-port=${NB_MASTER_PORT} +set $@ --db-sb-port=${SB_MASTER_PORT} if [ "x${NB_MASTER_PROTO}" = xtcp ]; then set $@ --db-nb-create-insecure-remote=yes @@ -307,6 +307,8 @@ ovsdb_server_start() { fi if [ "x${present_master}" = x ]; then +set $@ --db-nb-create-insecure-remote=yes +set $@ --db-sb-create-insecure-remote=yes # No master detected, or the previous master is not among the # set starting. # @@ -316,6 +318,8 @@ ovsdb_server_start() { set $@ --db-nb-sync-from-addr=${INVALID_IP_ADDRESS} --db-sb-sync-from-addr=${INVALID_IP_ADDR elif [ ${present_master} != ${host_name} ]; then +set $@ --db-nb-create-insecure-remote=no +set $@ --db-sb-create-insecure-remote=no But I noticed that if the slave becomes active post failover after active node reboot/failure, pacemaker shows it online but I am not able to access the dbs. # crm status Online: [ test-pace2-2365308 ] OFFLINE: [ test-pace1-2365293 ] Full list of resources: Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ test-pace2-2365308 ] Stopped: [ test-pace1-2365293 ] # ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add test444 ovn-nbctl: tcp:10.169.129.33:6641: database connection failed (Connection refused) # ovn-nbctl --db=tcp:10.169.129.34:6641 ls-add test444 ovn-nbctl: tcp:10.169.129.34:6641: database connection failed (Connection refused) Hence, if failover happens, slave is already running with --sync-from=lbVIP:6641/6642 for nb and sb db respectively. Thus, re-opening of tcp ports for nb and sb db on the slave that is getting promoted to master is not happening automatically. Let me know if there is a valid way/approach too which I am missing to handle it during slave promote logic? Will do further code changes accordingly. Note: Current code changes for use with LB will needs to be handled for ssl too. Will have to handle that separately but want to get the tcp working first and we can add ssl support later. Regards, Aliasgar On Wed, May 9, 2018 at 12:19 PM, Numan Siddique <nusid...@redhat.com> wrote: > > > On Thu, May 10, 2018 at 12:44 AM, Han Zhou <zhou...@gmail.com> wrote: > >> >> >> On Wed, May 9, 2018 at 11:51 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> >>> >>> On Thu, May 10, 2018 at 12:15 AM, Han Zhou <zhou...@gmail.com> wrote: >>> >>>> Thanks Ali for the quick patch. Please see my comments inline. >>>> >>>> On Wed, May 9, 2018 at 9:30 AM, aginwala <aginw...@asu.edu> wrote: >>>> > >>>> > Thanks Han and Numan for the clarity to help sort it out. >>>> > >>>> > For making vip work with using LB in my two node setup, I had changed >>>> below code to skip setting master IP when creating pcs resource for ovndbs >>>> and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the >>>> code change which is small for sure as below: >>>> > >>>> > >>>> > diff --git a/ovn/utilities/ovndb-servers.ocf >>&g
Re: [ovs-discuss] Question to OVN DB pacemaker script
Thanks Han and Numan for the clarity to help sort it out. For making vip work with using LB in my two node setup, I had changed below code to skip setting master IP when creating pcs resource for ovndbs and listen on 0.0.0.0 instead. Hence, the discussion seems inline with the code change which is small for sure as below: diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers. ocf index 164b6bc..d4c9ad7 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -295,8 +295,8 @@ ovsdb_server_start() { set ${OVN_CTL} -set $@ --db-nb-addr=${MASTER_IP} --db-nb-port=${NB_MASTER_PORT} -set $@ --db-sb-addr=${MASTER_IP} --db-sb-port=${SB_MASTER_PORT} +set $@ --db-nb-port=${NB_MASTER_PORT} +set $@ --db-sb-port=${SB_MASTER_PORT} if [ "x${NB_MASTER_PROTO}" = xtcp ]; then set $@ --db-nb-create-insecure-remote=yes Results: # accessing via LB VIP ovn-nbctl --db=tcp:10.149.7.56:6641 show switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666) ovn-nbctl --db=tcp:10.149.7.56:6641 ls-add ls55 # accessing via active node pool member root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 show switch bb130c99-a00d-43cf-b40a-9c6fb1df5ed7 (ls666) switch 41922d23-3430-436d-b67a-00422367a653 (ls55) # accessing using standby node pool member root@test-pace2-2365308:~# ovn-nbctl --db=tcp:10.169.129.33:6641 ls-add lss ovn-nbctl: transaction error: {"details":"insert operation not allowed when database serv # using connect string and skip using VIP resource just for reading db and not for writing. ovn-nbctl --db=tcp:10.169.129.34:6641,tcp:10.169.129.33:6641 show I am pointing northd and ovn-controller to the db vip which works as expected too. For northd, we can use local unix socket too which is valid as I have tested both ways by keeping it running on both nodes. I think its just a personal pref to use vip or unix socket as both are valid for northd. I think that we might need to update the documentation too with above details. I will send a formal patch along with documentation update. Let me know if there are other suggestions too in case anything is missed. Regards, Aliasgar On Wed, May 9, 2018 at 9:18 AM, Han Zhouwrote: > > > On Wed, May 9, 2018 at 9:02 AM, Numan Siddique > wrote: > >> >> >> On Wed, May 9, 2018 at 9:02 PM, Han Zhou wrote: >> >>> Hi Numan, >>> >>> Thanks you so much for the detailed answer! Please see my comments >>> inline. >>> >>> On Wed, May 9, 2018 at 7:41 AM, Numan Siddique >>> wrote: >>> Hi Han, Please see below for inline comments On Wed, May 9, 2018 at 5:17 AM, Han Zhou wrote: > Hi Babu/Numan, > > I have a question regarding OVN pacemaker OCF script. > I see in the script MASTER_IP is used to start the active DB and > standby DBs will use that IP to sync from. > > In the Documentation/topics/integration.rst it is also mentioned: > > `master_ip` is the IP address on which the active database server is > expected to be listening, the slave node uses it to connect to the master > node. > > However, since active node will change after failover, I wonder if we > should provide all the IPs of each nodes, and let pacemaker to decide > which > IP is the master IP to be used, dynamically. > > I see in the documentation it is mentioned about using the IPAddr2 > resource for virtual IP. Does it indicate that we should use the virtual > IP > as the master IP? > That is true. If the master ip is not virtual ip, then we will not be able to figure out which is the master node. We need to configure networking-ovn and ovn-controller to point to the right master node so that they can do write transactions on the DB. Below is how we have configured pacemaker OVN HA dbs in tripleo openstack deployment - Tripleo deployment creates many virtual IPs (using IPAddr2) and these IP addresses are frontend IPs for keystone and all other openstack API services and haproxy is used to load balance the traffic (the deployment will mostly have 3 controllers and all the openstack API services will be running on each node). - We choose one of the IPaddr2 virtual ip and we set a colocation constraint when creating the OVN pacemaker HA db resource i.e we ask pacemaker to promote the ovsdb-servers running in the node configured with the virtual ip (i.e master_ip). Pacemaker will call the promote action [1] on the node where master ip is configured. - tripleo configures "ovn_nb_connection=tcp:VIP:6641" and " ovn_sb_connection=tcp:VIP:6642" in neutron.conf and runs "ovs-vsctl set open . external_ids:ovn-remote=tcp:VIP:6642" on all the nodes where ovn-controller service is
Re: [ovs-discuss] raft ovsdb clustering
Cool! Yup Makes sense for sandbox northd also to point to clustered nb/sb dbs. On Wed, Apr 4, 2018 at 4:01 PM, Ben Pfaff <b...@ovn.org> wrote: > Oh, I see, from reading further in the thread, that this was indeed a > misunderstanding. Well, in any case that new option to ovs-sandbox can > be useful. > > On Wed, Apr 04, 2018 at 04:00:20PM -0700, Ben Pfaff wrote: > > I would like to support cluster-wide locks. They require extra work and > > they require new OVSDB JSON-RPC protocol design (because locks are > > currently per-server, not per-database). I do not currently have a > > schedule for designing and implementing them. > > > > However, I am surprised that this is an issue for northd. For a > > clustered database, ovn-northd always connects to the cluster leader. > > There is at most one leader in the cluster at a given time, so as long > > as ovn-northd obtains a lock on the leader, this should ensure that only > > one ovn-northd is active at a time. There could be brief races, in > > which two ovn-northds believe that they have the lock, but they should > > not persist. > > > > You see different behavior, so there is a bug or a misunderstanding. > > I don't see the same misbehavior, though, when I do a similar test in > > the sandbox. If you apply the patches I just posted: > > https://patchwork.ozlabs.org/patch/895184/ > > https://patchwork.ozlabs.org/patch/895185/ > > then you can try it out with: > > make sandbox SANDBOXFLAGS='--ovn --sbdb-model=clustered > --n-northds=3' > > > > On Wed, Mar 21, 2018 at 01:12:48PM -0700, aginwala wrote: > > > :) The only thing is while using pacemaker, if the node that pacemaker > if > > > pointing to is down, all the active/standby northd nodes have to be > updated > > > to new node from the cluster. But will dig in more to see what else I > can > > > find. > > > > > > @Ben: Any suggestions further? > > > > > > > > > Regards, > > > > > > On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote: > > > > > > > > > > > > > > > On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote: > > > > > > > >> Thanks Numan: > > > >> > > > >> Yup agree with the locking part. For now; yes I am running northd > on one > > > >> node. I might right a script to monitor northd in cluster so that > if the > > > >> node where it's running goes down, script can spin up northd on one > other > > > >> active nodes as a dirty hack. > > > >> > > > >> The "dirty hack" is pacemaker :) > > > > > > > > > > > >> Sure, will await for the inputs from Ben too on this and see how > complex > > > >> would it be to roll out this feature. > > > >> > > > >> > > > >> Regards, > > > >> > > > >> > > > >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique < > nusid...@redhat.com> > > > >> wrote: > > > >> > > > >>> Hi Aliasgar, > > > >>> > > > >>> ovsdb-server maintains locks per each connection and not across > the db. > > > >>> A workaround for you now would be to configure all the ovn-northd > instances > > > >>> to connect to one ovsdb-server if you want to have active/standy. > > > >>> > > > >>> Probably Ben can answer if there is a plan to support ovsdb locks > across > > > >>> the db. We also need this support in networking-ovn as it also > uses ovsdb > > > >>> locks. > > > >>> > > > >>> Thanks > > > >>> Numan > > > >>> > > > >>> > > > >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> > wrote: > > > >>> > > > >>>> Hi Numan: > > > >>>> > > > >>>> Just figured out that ovn-northd is running as active on all 3 > nodes > > > >>>> instead of one active instance as I continued to test further > which results > > > >>>> in db errors as per logs. > > > >>>> > > > >>>> > > > >>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs > in > > > >>>> ovn-north > > > &
Re: [ovs-discuss] raft ovsdb clustering
Sure: #Node1 /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=192.168.220.101 --db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.101:6645 --db-nb-create-insecure-remote=yes start_nb_ovsdb /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.220.101 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr=tcp:192.168.10.220:6644 start_sb_ovsdb ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168. 220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db=" tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor #Node2 /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=192.168.220.102 --db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.102:6645 --db-nb-cluster-remote-addr="tcp:192.168.220.101:6645" --db-nb-create-insecure-remote=yes start_nb_ovsdb /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.220.102 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:192.168.220.102:6644" --db-sb-cluster-remote-addr="tcp:192.168.220.101:6644" start_sb_ovsdb ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168. 220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db=" tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor #Node3 /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=192.168.220.103 --db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.103:6645 --db-nb-cluster-remote-addr="tcp:192.168.220.101:6645" --db-nb-create-insecure-remote=yes start_nb_ovsdb /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.220.103 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:192.168.220.103:6644" --db-sb-cluster-remote-addr="tcp:192.168.220.101:6644" start_sb_ovsdb ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp: 192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp: 192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor #.export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp: 192.168.220.101:6641" #. ovn-nbctl show can be done using command below ovn-nbctl --db=$remote show #.ovn-sbctl commands can be run as below: ovn-sbctl --db=$remote show Regards, On Tue, Mar 27, 2018 at 12:08 PM, Numan Siddique <nusid...@redhat.com> wrote: > Thanks Aliasgar, > > I am still facing the same issue. > > Can you also share the (ovn-ctl) commands you used to start/join the > ovsdb-server clusters in your nodes ? > > Thanks > Numan > > > On Tue, Mar 27, 2018 at 11:04 PM, aginwala <aginw...@asu.edu> wrote: > >> Hu Numan: >> >> You need to use --db as you are now running db in cluster, you can access >> data from any of the three dbs. >> >> So if the leader crashes, it re-elects from the other two. Below is the >> e.g. command: >> >> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp: >> 192.168.220.101:6641" >> # kill -9 3985 >> # ovn-nbctl --db=$remote show >> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1) >> # ovn-nbctl --db=$remote ls-del ls1 >> >> >> >> >> >> >> >> Hope it helps! >> >> Regards, >> >> >> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> Hi Aliasgar, >>> >>> In your setup, if you kill the leader what is the behaviour ? Are you >>> still able to create or delete any resources ? Is a new leader elected ? >>> >>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I >>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server >>> becoming leader. May be I have configured wrongly. >>> Could you please test this scenario if not yet please and let me know >>> your observations if possible. >>> >>> Thanks >>> Numan >>> >>> >>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhou...@gmail.com> wrote: >>> >>>> Sounds good. >>>> >>>> Just checked the patch, by default the C IDL has "leader_only" as true, >>>>
Re: [ovs-discuss] raft ovsdb clustering
Hu Numan: You need to use --db as you are now running db in cluster, you can access data from any of the three dbs. So if the leader crashes, it re-elects from the other two. Below is the e.g. command: # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp: 192.168.220.101:6641" # kill -9 3985 # ovn-nbctl --db=$remote show switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1) # ovn-nbctl --db=$remote ls-del ls1 Hope it helps! Regards, On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusid...@redhat.com> wrote: > Hi Aliasgar, > > In your setup, if you kill the leader what is the behaviour ? Are you > still able to create or delete any resources ? Is a new leader elected ? > > In my setup, the command "ovn-nbctl ls-add" for example blocks until I > restart the ovsdb-server in node 1. And I don't see any other ovsdb-server > becoming leader. May be I have configured wrongly. > Could you please test this scenario if not yet please and let me know your > observations if possible. > > Thanks > Numan > > > On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhou...@gmail.com> wrote: > >> Sounds good. >> >> Just checked the patch, by default the C IDL has "leader_only" as true, >> which ensures that connection is to leader only. This is the case for >> northd. So the lock works for northd hot active-standby purpose if all the >> ovsdb endpoints of a cluster are specified to northd, since all northds are >> connecting to the same DB, the leader. >> >> For neutron networking-ovn, this may not work yet, since I didn't see >> such logic in the python IDL in current patch series. It would be good if >> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong) >> >> >> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginw...@asu.edu> wrote: >> >>> Hi : >>> >>> Just sorted out the correct settings and northd also works in ha in raft. >>> >>> There were 2 issues in the setup: >>> 1. I had started nb db without --db-nb-create-insecure-remote >>> 2. I also started northd locally on all 3 without remote which is like >>> all three northd trying to lock the ovsdb locally. >>> >>> Hence, the duplicate logs were populated in the southbound datapath due >>> to multiple northd trying to write the local copy. >>> >>> So, I now start nb db with --db-nb-create-insecure-remote and northd on >>> all 3 nodes using below command: >>> >>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp: >>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" >>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: >>> 10.148.181.162:6642" --no-chdir >>> --log-file=/var/log/openvswitch/ovn-northd.log >>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor >>> >>> >>> #At start, northd went active on the leader node and standby on other >>> two nodes. >>> >>> #After old leader crashed and new leader got elected, northd goes active >>> on any of the remaining 2 nodes as per sample logs below from non-leader >>> node: >>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. >>> This ovn-northd instance is now on standby. >>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock >>> acquired. This ovn-northd instance is now active. >>> >>> # Also ovn-controller works similar way if leader goes down and connects >>> to any of the remaining 2 nodes: >>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642: >>> clustered database server is disconnected from cluster; trying another >>> server >>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642: >>> connection attempt timed out >>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642: >>> waiting 4 seconds before reconnect >>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642: >>> connected >>> >>> >>> >>> Above settings will also work if we put all the nodes behind the vip and >>> updates the ovn configs to use vips. So we don't need pacemaker explicitly >>> for northd HA :). >>> >>> Since the setup is complete now, I will populate the same in scale test >>> env and see how it behaves. >>> >>> @Numan: We can try the same with networking-ovn integration and see if >>> we find anything weird there too. Not sure
Re: [ovs-discuss] raft ovsdb clustering
Hi : Just sorted out the correct settings and northd also works in ha in raft. There were 2 issues in the setup: 1. I had started nb db without --db-nb-create-insecure-remote 2. I also started northd locally on all 3 without remote which is like all three northd trying to lock the ovsdb locally. Hence, the duplicate logs were populated in the southbound datapath due to multiple northd trying to write the local copy. So, I now start nb db with --db-nb-create-insecure-remote and northd on all 3 nodes using below command: ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp: 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: 10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor #At start, northd went active on the leader node and standby on other two nodes. #After old leader crashed and new leader got elected, northd goes active on any of the remaining 2 nodes as per sample logs below from non-leader node: 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby. 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. # Also ovn-controller works similar way if leader goes down and connects to any of the remaining 2 nodes: 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642: clustered database server is disconnected from cluster; trying another server 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642: connection attempt timed out 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642: waiting 4 seconds before reconnect 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642: connected Above settings will also work if we put all the nodes behind the vip and updates the ovn configs to use vips. So we don't need pacemaker explicitly for northd HA :). Since the setup is complete now, I will populate the same in scale test env and see how it behaves. @Numan: We can try the same with networking-ovn integration and see if we find anything weird there too. Not sure if you have any exclusive findings for this case. Let me know if something else is missed here. Regards, On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhou...@gmail.com> wrote: > Ali, sorry if I misunderstand what you are saying, but pacemaker here is > for northd HA. pacemaker itself won't point to any ovsdb cluster node. All > northds can point to a LB VIP for the ovsdb cluster, so if a member of > ovsdb cluster is down it won't have impact to northd. > > Without clustering support of the ovsdb lock, I think this is what we have > now for northd HA. Please suggest if anyone has any other idea. Thanks :) > > On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginw...@asu.edu> wrote: > >> :) The only thing is while using pacemaker, if the node that pacemaker if >> pointing to is down, all the active/standby northd nodes have to be updated >> to new node from the cluster. But will dig in more to see what else I can >> find. >> >> @Ben: Any suggestions further? >> >> >> Regards, >> >> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote: >> >>> >>> >>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Thanks Numan: >>>> >>>> Yup agree with the locking part. For now; yes I am running northd on >>>> one node. I might right a script to monitor northd in cluster so that if >>>> the node where it's running goes down, script can spin up northd on one >>>> other active nodes as a dirty hack. >>>> >>>> The "dirty hack" is pacemaker :) >>> >>> >>>> Sure, will await for the inputs from Ben too on this and see how >>>> complex would it be to roll out this feature. >>>> >>>> >>>> Regards, >>>> >>>> >>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com> >>>> wrote: >>>> >>>>> Hi Aliasgar, >>>>> >>>>> ovsdb-server maintains locks per each connection and not across the >>>>> db. A workaround for you now would be to configure all the ovn-northd >>>>> instances to connect to one ovsdb-server if you want to have >>>>> active/standy. >>>>> >>>>> Probably Ben can answer if there is a plan to support ovsdb locks >>>>> across the db. We also need this support in networking-ovn as it also
Re: [ovs-discuss] raft ovsdb clustering
:) The only thing is while using pacemaker, if the node that pacemaker if pointing to is down, all the active/standby northd nodes have to be updated to new node from the cluster. But will dig in more to see what else I can find. @Ben: Any suggestions further? Regards, On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote: > > > On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote: > >> Thanks Numan: >> >> Yup agree with the locking part. For now; yes I am running northd on one >> node. I might right a script to monitor northd in cluster so that if the >> node where it's running goes down, script can spin up northd on one other >> active nodes as a dirty hack. >> >> The "dirty hack" is pacemaker :) > > >> Sure, will await for the inputs from Ben too on this and see how complex >> would it be to roll out this feature. >> >> >> Regards, >> >> >> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> Hi Aliasgar, >>> >>> ovsdb-server maintains locks per each connection and not across the db. >>> A workaround for you now would be to configure all the ovn-northd instances >>> to connect to one ovsdb-server if you want to have active/standy. >>> >>> Probably Ben can answer if there is a plan to support ovsdb locks across >>> the db. We also need this support in networking-ovn as it also uses ovsdb >>> locks. >>> >>> Thanks >>> Numan >>> >>> >>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Hi Numan: >>>> >>>> Just figured out that ovn-northd is running as active on all 3 nodes >>>> instead of one active instance as I continued to test further which results >>>> in db errors as per logs. >>>> >>>> >>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in >>>> ovn-north >>>> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error: >>>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table >>>> to have identical values (1) for index on column \"tunnel_key\". First >>>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by >>>> this transaction. Second row, with UUID >>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, >>>> existed in the database before this transaction and was not modified by the >>>> transaction.","error":"constraint violation"} >>>> >>>> In southbound datapath list, 2 duplicate records gets created for same >>>> switch. >>>> >>>> # ovn-sbctl list Datapath >>>> _uuid : b270ae30-3458-445f-95d2-b14e8ebddd01 >>>> external_ids: >>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>>> name="ls2"} >>>> tunnel_key : 2 >>>> >>>> _uuid : 8e06f919-4cc7-4ffc-9a79-20ce6663b683 >>>> external_ids: >>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>>> name="ls2"} >>>> tunnel_key : 1 >>>> >>>> >>>> >>>> # on nodes 1 and 2 where northd is running, it gives below error: >>>> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error: >>>> {"details":"cannot delete Datapath_Binding row >>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining >>>> reference(s)","error":"referential integrity violation"} >>>> >>>> As per commit message, for northd I re-tried setting --ovnnb-db="tcp: >>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" >>>> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: >>>> 10.148.181.162:6642" and it did not help either. >>>> >>>> There is no issue if I keep running only one instance of northd on any >>>> of these 3 nodes. Hence, wanted to know is there something else >>>> missing here to make only one northd instance as active and rest as >>>> standby? >>>> >>>> >>>> Regards, >>>> >>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com> >>>> wrote: >
Re: [ovs-discuss] raft ovsdb clustering
Thanks Numan: Yup agree with the locking part. For now; yes I am running northd on one node. I might right a script to monitor northd in cluster so that if the node where it's running goes down, script can spin up northd on one other active nodes as a dirty hack. Sure, will await for the inputs from Ben too on this and see how complex would it be to roll out this feature. Regards, On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusid...@redhat.com> wrote: > Hi Aliasgar, > > ovsdb-server maintains locks per each connection and not across the db. A > workaround for you now would be to configure all the ovn-northd instances > to connect to one ovsdb-server if you want to have active/standy. > > Probably Ben can answer if there is a plan to support ovsdb locks across > the db. We also need this support in networking-ovn as it also uses ovsdb > locks. > > Thanks > Numan > > > On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> wrote: > >> Hi Numan: >> >> Just figured out that ovn-northd is running as active on all 3 nodes >> instead of one active instance as I continued to test further which results >> in db errors as per logs. >> >> >> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in >> ovn-north >> 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error: >> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table >> to have identical values (1) for index on column \"tunnel_key\". First >> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by >> this transaction. Second row, with UUID >> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, >> existed in the database before this transaction and was not modified by the >> transaction.","error":"constraint violation"} >> >> In southbound datapath list, 2 duplicate records gets created for same >> switch. >> >> # ovn-sbctl list Datapath >> _uuid : b270ae30-3458-445f-95d2-b14e8ebddd01 >> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >> name="ls2"} >> tunnel_key : 2 >> >> _uuid : 8e06f919-4cc7-4ffc-9a79-20ce6663b683 >> external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >> name="ls2"} >> tunnel_key : 1 >> >> >> >> # on nodes 1 and 2 where northd is running, it gives below error: >> 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error: >> {"details":"cannot delete Datapath_Binding row >> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining >> reference(s)","error":"referential integrity violation"} >> >> As per commit message, for northd I re-tried setting --ovnnb-db="tcp: >> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" >> and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: >> 10.148.181.162:6642" and it did not help either. >> >> There is no issue if I keep running only one instance of northd on any of >> these 3 nodes. Hence, wanted to know is there something else missing >> here to make only one northd instance as active and rest as standby? >> >> >> Regards, >> >> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> That's great >>> >>> Numan >>> >>> >>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Hi Numan: >>>> >>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with >>>> fresh installation and it worked super fine for both sb and nb dbs. Seems >>>> like some kernel issue on the previous nodes when I re-installed raft patch >>>> as I was running different ovs version on those nodes before. >>>> >>>> >>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp: >>>> 10.169.125.131:6642, tcp:10.148.181.162:6642" and started controller >>>> and it works super fine. >>>> >>>> >>>> Did some failover testing by rebooting/killing the leader ( >>>> 10.169.125.152) and bringing it back up and it works as expected. >>>> Nothing weird noted so far. >>>> >>>> # check-cluster gives below data one of the node(10.148.181.162) post >>>> leader failure >>>> >>>> ovsdb-tool check-cluster /et
Re: [ovs-discuss] raft ovsdb clustering
Hi Numan: Just figured out that ovn-northd is running as active on all 3 nodes instead of one active instance as I continued to test further which results in db errors as per logs. # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs in ovn-north 2018-03-21T06:01:59.442Z|7|ovsdb_idl|WARN|transaction error: {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table to have identical values (1) for index on column \"tunnel_key\". First row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted by this transaction. Second row, with UUID 8e06f919-4cc7-4ffc-9a79-20ce6663b683, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"} In southbound datapath list, 2 duplicate records gets created for same switch. # ovn-sbctl list Datapath _uuid : b270ae30-3458-445f-95d2-b14e8ebddd01 external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"} tunnel_key : 2 _uuid : 8e06f919-4cc7-4ffc-9a79-20ce6663b683 external_ids: {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", name="ls2"} tunnel_key : 1 # on nodes 1 and 2 where northd is running, it gives below error: 2018-03-21T06:01:59.437Z|8|ovsdb_idl|WARN|transaction error: {"details":"cannot delete Datapath_Binding row 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining reference(s)","error":"referential integrity violation"} As per commit message, for northd I re-tried setting --ovnnb-db="tcp: 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" and --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: 10.148.181.162:6642" and it did not help either. There is no issue if I keep running only one instance of northd on any of these 3 nodes. Hence, wanted to know is there something else missing here to make only one northd instance as active and rest as standby? Regards, On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <nusid...@redhat.com> wrote: > That's great > > Numan > > > On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginw...@asu.edu> wrote: > >> Hi Numan: >> >> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with >> fresh installation and it worked super fine for both sb and nb dbs. Seems >> like some kernel issue on the previous nodes when I re-installed raft patch >> as I was running different ovs version on those nodes before. >> >> >> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp: >> 10.169.125.131:6642, tcp:10.148.181.162:6642" and started controller >> and it works super fine. >> >> >> Did some failover testing by rebooting/killing the leader (10.169.125.152) >> and bringing it back up and it works as expected. Nothing weird noted so >> far. >> >> # check-cluster gives below data one of the node(10.148.181.162) post >> leader failure >> >> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log >> entries only up to index 18446744073709551615, but index 9 was committed in >> a previous term (e.g. by /etc/openvswitch/ovnsb_db.db) >> >> >> For check-cluster, are we planning to add more output showing which node >> is active(leader), etc in upcoming versions ? >> >> >> Thanks a ton for helping sort this out. I think the patch looks good to >> be merged post addressing of the comments by Justin along with the man page >> details for ovsdb-tool. >> >> >> I will do some more crash testing for the cluster along with the scale >> test and keep you posted if something unexpected is noted. >> >> >> >> Regards, >> >> >> >> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> >>> >>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Sure. >>>> >>>> To add on , I also ran for nb db too using different port and Node2 >>>> crashes with same error : >>>> # Node 2 >>>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138 >>>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645" >>>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb >>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot >>>> identify file type >>>> >>>> >>&g
Re: [ovs-discuss] raft ovsdb clustering
Hi Numan: I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh installation and it worked super fine for both sb and nb dbs. Seems like some kernel issue on the previous nodes when I re-installed raft patch as I was running different ovs version on those nodes before. For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp: 10.169.125.131:6642, tcp:10.148.181.162:6642" and started controller and it works super fine. Did some failover testing by rebooting/killing the leader (10.169.125.152) and bringing it back up and it works as expected. Nothing weird noted so far. # check-cluster gives below data one of the node(10.148.181.162) post leader failure ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries only up to index 18446744073709551615, but index 9 was committed in a previous term (e.g. by /etc/openvswitch/ovnsb_db.db) For check-cluster, are we planning to add more output showing which node is active(leader), etc in upcoming versions ? Thanks a ton for helping sort this out. I think the patch looks good to be merged post addressing of the comments by Justin along with the man page details for ovsdb-tool. I will do some more crash testing for the cluster along with the scale test and keep you posted if something unexpected is noted. Regards, On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusid...@redhat.com> wrote: > > > On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginw...@asu.edu> wrote: > >> Sure. >> >> To add on , I also ran for nb db too using different port and Node2 >> crashes with same error : >> # Node 2 >> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138 >> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645" >> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb >> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify >> file type >> >> >> > Hi Aliasgar, > > It worked for me. Can you delete the old db files in /etc/openvswitch/ and > try running the commands again ? > > Below are the commands I ran in my setup. > > Node 1 > --- > sudo /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.121.91 > --db-sb-port=6642 --db-sb-create-insecure-remote=yes > --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb > > Node 2 > - > sudo /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.121.87 > --db-sb-port=6642 --db-sb-create-insecure-remote=yes > --db-sb-cluster-local-addr="tcp:192.168.121.87:6644" > --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" start_sb_ovsdb > > Node 3 > - > sudo /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=192.168.121.78 > --db-sb-port=6642 --db-sb-create-insecure-remote=yes > --db-sb-cluster-local-addr="tcp:192.168.121.78:6644" > --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" start_sb_ovsdb > > > > Thanks > Numan > > > > > >> >> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> >>> >>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Thanks Numan for the response. >>>> >>>> There is no command start_cluster_sb_ovsdb in the source code too. Is >>>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb >>>> which I think would not be a right choice? >>>> >>> >>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let >>> me try it out again and update this thread. >>> >>> Thanks >>> Numan >>> >>> >>>> >>>> # Node1 came up as expected. >>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp: >>>> 10.99.152.148:6644" start_sb_ovsdb. >>>> >>>> # verifying its a clustered db with ovsdb-tool db-local-address >>>> /etc/openvswitch/ovnsb_db.db >>>> tcp:10.99.152.148:6644 >>>> # ovn-sbctl show works fine and chassis are being populated correctly. >>>> >>>> #Node 2 fails with error: >>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 >>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes >>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb >>>&g
Re: [ovs-discuss] raft ovsdb clustering
Sure. To add on , I also ran for nb db too using different port and Node2 crashes with same error : # Node 2 /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138 --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645" --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify file type On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusid...@redhat.com> wrote: > > > On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginw...@asu.edu> wrote: > >> Thanks Numan for the response. >> >> There is no command start_cluster_sb_ovsdb in the source code too. Is >> that in a separate commit somewhere? Hence, I used start_sb_ovsdb which >> I think would not be a right choice? >> > > Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let me > try it out again and update this thread. > > Thanks > Numan > > >> >> # Node1 came up as expected. >> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp: >> 10.99.152.148:6644" start_sb_ovsdb. >> >> # verifying its a clustered db with ovsdb-tool db-local-address >> /etc/openvswitch/ovnsb_db.db >> tcp:10.99.152.148:6644 >> # ovn-sbctl show works fine and chassis are being populated correctly. >> >> #Node 2 fails with error: >> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 >> --db-sb-port=6642 --db-sb-create-insecure-remote=yes >> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb >> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify >> file type >> >> # So i did start the sb db the usual way using start_ovsdb to just get >> the db file created and killed the sb pid and re-ran the command which gave >> actual error where it complains for join-cluster command that is being >> called internally >> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 >> --db-sb-port=6642 --db-sb-create-insecure-remote=yes >> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb >> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database >> * Backing up database to /etc/openvswitch/ovnsb_db.db.b >> ackup1.15.0-70426956 >> ovsdb-tool: 'join-cluster' command requires at least 4 arguments >> * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing >> one >> >> >> # based on above error I killed the sb db pid again and try to create a >> local cluster on node then re-ran the join operation as per the source >> code function. >> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: >> 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains >> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File >> exists) >> >> >> # Node 3: I did not try as I am assuming the same failure as node 2 >> >> >> Let me know may know further. >> >> >> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> Hi Aliasgar, >>> >>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Hi Ben/Noman: >>>> >>>> I am trying to setup 3 node southbound db cluster using raft10 >>>> <https://patchwork.ozlabs.org/patch/854298/> in review. >>>> >>>> # Node 1 create-cluster >>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db >>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642 >>>> >>> >>> A different port is used for RAFT. So you have to choose another port >>> like 6644 for example. >>> >> >>>> >>>> # Node 2 >>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: >>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >>>> 5dfcb678-bb1d-4377-b02d-a380edec2982 >>>> >>>> #Node 3 >>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: >>>> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >>>> 5dfcb678-bb1d-4377-b02d-a380edec2982 >>>> >>>> # ovn remote is set to all 3 nodes >>>> external_ids:ovn-remote="tcp:10.99.152.148:6642,
Re: [ovs-discuss] raft ovsdb clustering
Thanks Numan for the response. There is no command start_cluster_sb_ovsdb in the source code too. Is that in a separate commit somewhere? Hence, I used start_sb_ovsdb which I think would not be a right choice? # Node1 came up as expected. ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp: 10.99.152.148:6644" start_sb_ovsdb. # verifying its a clustered db with ovsdb-tool db-local-address /etc/openvswitch/ovnsb_db.db tcp:10.99.152.148:6644 # ovn-sbctl show works fine and chassis are being populated correctly. #Node 2 fails with error: /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot identify file type # So i did start the sb db the usual way using start_ovsdb to just get the db file created and killed the sb pid and re-ran the command which gave actual error where it complains for join-cluster command that is being called internally /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database * Backing up database to /etc/openvswitch/ovnsb_db.db.backup1.15.0-70426956 ovsdb-tool: 'join-cluster' command requires at least 4 arguments * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing one # based on above error I killed the sb db pid again and try to create a local cluster on node then re-ran the join operation as per the source code function. ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed (File exists) # Node 3: I did not try as I am assuming the same failure as node 2 Let me know may know further. On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusid...@redhat.com> wrote: > Hi Aliasgar, > > On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginw...@asu.edu> wrote: > >> Hi Ben/Noman: >> >> I am trying to setup 3 node southbound db cluster using raft10 >> <https://patchwork.ozlabs.org/patch/854298/> in review. >> >> # Node 1 create-cluster >> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db >> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642 >> > > A different port is used for RAFT. So you have to choose another port like > 6644 for example. > >> >> # Node 2 >> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: >> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >> 5dfcb678-bb1d-4377-b02d-a380edec2982 >> >> #Node 3 >> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp: >> 10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >> 5dfcb678-bb1d-4377-b02d-a380edec2982 >> >> # ovn remote is set to all 3 nodes >> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:10.99.152.138:6642, >> tcp:10.99.152.101:6642" >> > >> # Starting sb db on node 1 using below command on node 1: >> >> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc >> --log-file=/var/log/openvswitch/ovsdb-server-sb.log >> --pidfile=/var/run/openvswitch/ovnsb_db.pid >> --remote=db:OVN_Southbound,SB_Global,connections --unixctl=ovnsb_db.ctl >> --private-key=db:OVN_Southbound,SSL,private_key >> --certificate=db:OVN_Southbound,SSL,certificate >> --ca-cert=db:OVN_Southbound,SSL,ca_cert >> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols >> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers >> --remote=punix:/var/run/openvswitch/ovnsb_db.sock >> /etc/openvswitch/ovnsb_db.db >> >> # check-cluster is returning nothing >> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >> >> # ovsdb-server-sb.log below shows the leader is elected with only one >> server and there are rbac related debug logs with rpc replies and empty >> params with no errors >> >> 2018-03-13T01:12:02Z|2|raft|DBG|server 63d1 added to configuration >> 2018-03-13T01:12:02Z|3|raft|INFO|term 6: starting election >> 2018-03-13T01:12:02Z|4|raft|INFO|term 6: elected leader by 1+ of 1 >> servers >> >> >> Now Starting the ovsdb-server on the other clusters fails saying >> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cann
Re: [ovs-discuss] OVN load balancing on same subnet failing
Hi: IRL , we always use different subnets for VIPs for OpenStack workloads in production for couple of reasons: 1. It's easy to fail over in case of outages if VIP and pool members are in different subnets. 2. It is also easy for neutron's IPAM to manage 2 different subnets; one for VIP and other for VM/containers instead of allocating from a same subnet because neutron doesn't care if the allocated IP is getting used for VIP or VM/container Hence, I think its ok to stick with the solution suggested by Guru. If folks in OpenStack community are exclusively asking for this requirement; this implementation is worth prioritizing. On Fri, Mar 2, 2018 at 9:40 AM, Guru Shettywrote: > > > On 1 March 2018 at 21:09, Anil Venkata wrote: > >> >> >> On Fri, Mar 2, 2018 at 7:23 AM, Guru Shetty wrote: >> >>> >>> >>> On 27 February 2018 at 03:13, Anil Venkata >>> wrote: >>> For example, I have a 10.1.0.0/24 network and a load balancer is added to it with 10.1.0.10 as VIP and 10.1.0.2(MAC 50:54:00:00:00:01), 10.1.0.3(MAC 50:54:00:00:00:02) as members. ovn-nbctl create load_balancer vips:10.1.0.10="10.1.0.2,10.1.0.3" >>> >>> We currently need the VIP to be in a different subnet. You should >>> connect switch it to a dummy logical router (or connect it to a external >>> router). Since a VIP is in a different subnet, it sends an ARP for logical >>> router IP and then things will work. >>> >>> >> >> Thanks Guru. Any reason for introducing this constraint(i.e VIP to be in >> a different subnet)? Can we address this limitation? >> > > It was just easy to implement with the constraint. You will need a ARP > responder for the VIP. And now, you will have to specify the mac address > for each VIP in the schema. So that is a bit of work - but not hard. > > >> >> When I try to send a request from client within the subnet(i.e 10.1.0.33) its not reaching any load balancer members. I noticed ARP not resolved for VIP 10.1.0.10. I tried to resolve this in two ways 1) Adding a new ARP reply ovs flow for VIP 10.1.0.10 with router port's MAC. When client tries to connect VIP, it will use router's MAC. Now router gets the packet after load balancing, and will forward the packet to appropriate member. 2) Second approach, a) Using a new MAC(example, 50:54:00:00:00:ab) for VIP 10.1.0.10, and adding a new ARP reply flow with this MAC. b) As we are not using router, when load balancing changes destination ip, VIP MAC has to be replaced with corresponding member's MAC i.e sudo ovs-ofctl add-flow br-int "table=24,ip,priority=150,dl_d st=50:54:00:00:00:ab,nw_dst=10.1.0.2,action=mod_dl_dst:50:54 :00:00:00:01,load:0x1->NXM_NX_REG15[],resubmit(,32)" sudo ovs-ofctl add-flow br-int "table=24,ip,priority=150,dl_d st=50:54:00:00:00:ab,nw_dst=10.1.0.3,action=mod_dl_dst:50:54 :00:00:00:02,load:0x2->NXM_NX_REG15[],resubmit(,32)" Which approach will be better or is there any alternate solution? Thanks Anil ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>> >> > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] ovs-vswitchd 100% CPU in OVN scale test
Hi All: As per the discussions/requests by Mark and Numan, I finally reverted the mtu patch (commit-id 8c319e8b73032e06c7dd1832b3b31f8a1189dcd1) on branch-2.9 and re-ran the test with 10k lports to bind on farms, with 8 LRs and 40 LS ;and results improvised. Since ovs did not go super hot, it did result in completion for 10k ports binding to HVs in 5 hrs 28 minutes vs 8 hours with mtu patch. Thus, the extra strcmp did add the overhead. Cpu idle graph of farm with 50 Hvs running 2.9 with/without mtu is available @ https://raw.githubusercontent.com/noah8713/ovn-scale-test/4cef99dbe9a0677a1b2d852b7f4f429ede340875/results/overlay/farm_cpu_2.9.png which indicates ;running non-mtu patch had more idle cpu percentage vs with mtu patch. In addition, I have also captured ovs-vswitchd process cpu util on farm since ovs itself was creating a bottleneck by slowing down the port binding on the computes.Graph is available @ https://raw.githubusercontent.com/noah8713/ovn-scale-test/scale_results/results/overlay/ovs-vswitchd_2.9_util.png Hence, overall performance improved which resulted in faster completion of all 10k port bindings. On Thu, Feb 15, 2018 at 12:20 PM, Mark Michelsonwrote: > > > On 02/08/2018 07:55 PM, Han Zhou wrote: > >> >> >> On Wed, Feb 7, 2018 at 12:47 PM, Han Zhou zhou...@gmail.com>> wrote: >> > >> > When doing scale testing for OVN (using https://github.com/openvswitch >> /ovn-scale-test), we had some interesting findings, and need some help >> here. >> > >> > We ran the test "create and bind lports" against branch 2.9 and branch >> 2.6, and we found that 2.6 was must faster. With some analysis, we found >> out the reason is not because of OVN gets slower in 2.9, but because the >> bottleneck of this test in branch 2.9 is ovs-vswitchd. >> > >> > The testing was run in an environment with 20 farm nodes, each has 50 >> sandbox HVs (I will just mention them as HVs in short). Before the test, >> there are already 9500 lports bound in 950 HVs on 19 farm nodes. The test >> run against the last farm node to bind the lport on the 50 HVs there. The >> steps in the test scenario are: >> > >> > 1. Create 5 new LSs in NB (so that the LSs will not be shared with any >> of HVs on other farm nodes) >> > 2. create 100 lports in NB on a LS >> > 3. bind these lports on HVs, 2 for each HV. They are bound >> sequentially on each HV, and for each HV the 2 ports are bound using one >> command together: ovs-vsctl add-port -- set Interface >> external-ids:... -- add-port -- set Interface external-ids:... >> (the script didn't set type to internal, but I hope it is not an issue for >> this test). >> > 4. wait the port stated changed to up in NB for all the 100 lports >> (with a single ovn-nbctl command) >> > >> > These steps are repeated for 5 times, one for each LS. So in the end >> we got 500 more lports created and bound (the total scale is then 1k HVs >> and 10k lports). >> > >> > When running with 2.6, the ovn-controllers were taking most of the CPU >> time. However, with 2.9, the CPU of ovn-controllers spikes but there is >> always ovs-vswitchd on the top with 100% CPU. It means the ovs-vswitchd is >> the bottleneck in this testing. There is only one ovs-vswitchd with 100% at >> the same time and different ovs-vswitchd will spike one after another, >> since the ports are bound sequentially on each HV. From the rally log, each >> 2 ports binding takes around 4 - 5 seconds. This is just the ovs-vsctl >> command execution time. The 100% CPU of ovs-vswitchd explains the slowness. >> > >> > So, based on this result, we can not using the total time to evaluate >> the efficiency of OVN, instead we can evaluate by CPU cost of >> ovn-controller processes. In fact, 2.9 ovn-controller costs around 70% less >> CPU than 2.6, which I think is due to some optimization we made earlier. >> (With my work-in-progress patch it saves much more, and I will post later >> as RFC). >> > >> > However, I cannot explain why ovs-vswitchd is getting slower than 2.6 >> when doing port-binding. We need expert suggestions here, for what could be >> the possible reason of this slowness. We can do more testing with different >> versions between 2.6 and 2.9 to find out related change, but with some >> pointers it might save some effort. Below are some logs of ovs-vswitchd >> when port binding is happening: >> > >> > == >> > 2018-02-07T00:12:54.558Z|01767|bridge|INFO|bridge br-int: added >> interface lport_bc65cd_QFOU3v on port 1028 >> > 2018-02-07T00:12:55.629Z|01768|timeval|WARN|Unreasonably long 1112ms >> poll interval (1016ms user, 4ms system) >> > 2018-02-07T00:12:55.629Z|01769|timeval|WARN|faults: 336 minor, 0 major >> > 2018-02-07T00:12:55.629Z|01770|timeval|WARN|context switches: 0 >> voluntary, 13 involuntary >> > 2018-02-07T00:12:55.629Z|01771|coverage|INFO|Event coverage, avg rate >> over last: 5 seconds, last minute, last hour, hash=b256889c: >> >