Greetings OVS Discuss Group: First - I want to apologize for the wall of text to follow - I wasn't sure how much information would be wanted and I didn't want everyone to have to ask 100 questions to get what they needed.
I am working on settings up an OVN Controller (3 Nodes) using Pacemaker / Corosync. For reference -- A single node controller (no pacemaker / corosync) operates without issue. Root Goal: Failover Redundancy for the OVN Controller - Allows for maintenance of controller nodes and/or failure of a controller node. I have been following the very limited documentation on how to setup this environment but don't seem to be having much luck getting it to be 100% stable or operational. http://docs.openvswitch.org/en/latest/topics/integration/?highlight=pacemaker If I could ask someone to assist in reviewing the below installation and provide some insight into what I may be doing wrong, or have wrong (need newer version of code, etc) I would be grateful. I've currently only attempted this by using "packaged" versions of code to "keep it simple" ... but I do realize that getting this to work may require a newer code release. Along with this, once I am able to get a stable environment, I would like to contribute updated documentation to the community on how to perform a full setup. -- Environment - # cat /etc/lsb-release | grep DESCRIPTION DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS" # ovn-northd --version ovn-northd (Open vSwitch) 2.9.2 # ovn-nbctl --version ovn-nbctl (Open vSwitch) 2.9.2 DB Schema 5.10.0 # ovn-sbctl --version ovn-sbctl (Open vSwitch) 2.9.2 DB Schema 1.15.0 -- Setup Steps - # cat /etc/hosts # LAB - Compute Nodes 192.168.100.10 ctrl00 192.168.100.11 ctrl01 192.168.100.12 ctrl02 192.168.100.13 ctrl03 192.168.100.76 cn01 192.168.100.77 cn02 192.168.100.78 cn03 192.168.100.79 cn04 ### All Controllers ## System Package Updates apt clean all; apt update; apt -y dist-upgrade ## Time Sync Services (NTP) [ctrl01, ctrl02, ctrl03] apt install -y ntp ## Install OVN Central Controller [ctrl01, ctrl02, ctrl03] apt install -y openvswitch-common openvswitch-switch python-openvswitch python3-openvswitch ovn-common ovn-central ## Install pacemaker and corosync [ctrl01, ctrl02, ctrl03] apt install -y pcs pacemaker pacemaker-cli-utils ## Reset the Pacemaker Password [ctrl01, ctrl02, ctrl03] Password = 6B43WAmuPzM2Ewsr enc_passwd=$(python3 -c 'import crypt; print(crypt.crypt("6B43WAmuPzM2Ewsr", crypt.mksalt(crypt.METHOD_SHA512)))') usermod -p "${enc_passwd}" hacluster ## Enable and Start the PCS Daemon [ctrl01, ctrl02, ctrl03] sudo systemctl enable pcsd; sudo systemctl start pcsd; sudo systemctl status pcsd; ``` ### Cluster Controller #1 (ONLY) ## Enable Cluster Services [ctrl01 --ONLY-- ] pcs cluster auth ctrl01 ctrl02 ctrl03 -u hacluster -p '6B43WAmuPzM2Ewsr' --force pcs cluster setup --name OVN-CLUSTER ctrl01 ctrl02 ctrl03 --force pcs cluster enable --all; pcs cluster start --all; pcs property set stonith-enabled=false pcs property set no-quorum-policy=ignore pcs status [[ output ]] Cluster name: OVN-CLUSTER Stack: corosync Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Wed May 1 09:42:33 2019 Last change: Wed May 1 09:42:29 2019 by root via cibadmin on ctrl01 3 nodes configured 0 resources configured Online: [ ctrl01 ctrl02 ctrl03 ] No resources Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ## Add cluster resources [node01 --ONLY-- ] pcs resource create ovn-virtual-ip ocf:heartbeat:IPaddr2 nic=ens192 ip=192.168.100.10 cidr_netmask=24 op monitor interval=30s pcs resource create ovndb_servers ocf:ovn:ovndb-servers \ master_ip=192.168.100.10 \ ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl \ op monitor interval="10s" \ op monitor role=Master interval="15s" pcs resource master ovndb_servers-master ovndb_servers \ meta notify="true" pcs constraint order promote ovndb_servers-master then ovn-virtual-ip pcs constraint colocation add ovn-virtual-ip with master ovndb_servers-master score=INFINITY pcs status [[ output ]] Cluster name: OVN-CLUSTER Stack: corosync Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Wed May 1 09:46:02 2019 Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02 3 nodes configured 4 resources configured Online: [ ctrl01 ctrl02 ctrl03 ] Full list of resources: ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02 Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ ctrl02 ] Slaves: [ ctrl01 ctrl03 ] Failed Actions: * ovndb_servers_monitor_10000 on ctrl01 'master' (8): call=18, status=complete, exitreason='', last-rc-change='Wed May 1 09:43:28 2019', queued=0ms, exec=73ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ///// At this point, I execute 'pcs cluster stop {{ controller }}; pcs cluster start {{ controller }}' for each controller one at a time and eventually everything clears up. # pcs status Cluster name: OVN-CLUSTER Stack: corosync Current DC: ctrl02 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Wed May 1 09:47:50 2019 Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02 3 nodes configured 4 resources configured Online: [ ctrl01 ctrl02 ctrl03 ] Full list of resources: ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02 Master/Slave Set: ovndb_servers-master [ovndb_servers] Masters: [ ctrl02 ] Slaves: [ ctrl01 ctrl03 ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled ///// lab-kvmctrl-01:~# ovn-sbctl show Chassis "3fd25b76-3170-4eab-8604-690182500478" hostname: "lab-vxlan-cn03" Encap geneve ip: "192.168.100.78" options: {csum="true"} Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3" hostname: "lab-vxlan-cn01" Encap geneve ip: "192.168.100.76" options: {csum="true"} Chassis "bda632c5-afeb-41bd-80e5-5c423172a771" hostname: "lab-vxlan-cn02" Encap geneve ip: "192.168.100.77" options: {csum="true"} Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c" hostname: "lab-vxlan-cn04" Encap geneve ip: "192.168.100.79" options: {csum="true"} ///// Now I turn up a VM on CN01 and it connects to 'br-int' for one of the interfaces. OVN_NB has the logical switch configured, and the port assignment. OVN_SB never receives the port state 'online' from the CN. # ovs-vsctl show aca349bd-6a23-47f6-98da-a35773753858 Bridge br-int fail_mode: secure Port br-int Interface br-int type: internal Port "ovn-bda632-0" Interface "ovn-bda632-0" type: geneve options: {csum="true", key=flow, remote_ip="192.168.100.77"} Port "ovn-3fd25b-0" Interface "ovn-3fd25b-0" type: geneve options: {csum="true", key=flow, remote_ip="192.168.100.78"} Port "ovn-5bd199-0" Interface "ovn-5bd199-0" type: geneve options: {csum="true", key=flow, remote_ip="192.168.100.79"} Port "525401c1d4e1" <<<<<<<<< VM PORT Interface "525401c1d4e1" <<<<<<<<< VM PORT ///// VM DOMXML - INTERFACE ///// <interface type='bridge'> <mac address='52:54:01:c1:d4:e1'/> <source bridge='br-int'/> <virtualport type='openvswitch'> <parameters interfaceid='9de72cf5-cbb2-4ebe-9c89-3962a29ed869'/> </virtualport> <target dev='525401c1d4e1'/> <model type='virtio'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </interface> # ovn-nbctl show switch 9f87e014-4b3d-40e4-9a42-9f0b28957c05 (ls_1234) port 9de72cf5-cbb2-4ebe-9c89-3962a29ed869 addresses: ["52:54:01:c1:d4:e1"] # ovn-sbctl show Chassis "3fd25b76-3170-4eab-8604-690182500478" hostname: "lab-vxlan-cn03" Encap geneve ip: "192.168.100.78" options: {csum="true"} Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c" hostname: "lab-vxlan-cn04" Encap geneve ip: "192.168.100.79" options: {csum="true"} Chassis "bda632c5-afeb-41bd-80e5-5c423172a771" hostname: "lab-vxlan-cn02" Encap geneve ip: "192.168.100.77" options: {csum="true"} Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3" hostname: "lab-vxlan-cn01" Encap geneve ip: "192.168.100.76" options: {csum="true"} lab-vxlan-cn01# ovs list open_vswitch | grep external_ids external_ids : {hostname="lab-vxlan-cn01", ovn-encap-ip="192.168.100.76", ovn-encap-type=geneve, ovn-nb="tcp:192.168.100.10:6641", ovn-remote="tcp:192.168.100.10:6642", rundir="/var/run/openvswitch", system-id="0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"} ///// Netstat output from "master" controller ///// # netstat -antp | grep -v sshd | grep -v WAIT Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:2224 0.0.0.0:* LISTEN 1885/ruby tcp 0 0 192.168.100.10:6641 0.0.0.0:* LISTEN 3493/ovsdb-server tcp 0 0 192.168.100.10:6642 0.0.0.0:* LISTEN 3503/ovsdb-server tcp 0 0 192.168.100.12:48362 192.168.100.12:2224 ESTABLISHED 1885/ruby tcp 0 0 192.168.100.12:2224 192.168.100.11:58166 ESTABLISHED 1885/ruby tcp 0 0 192.168.100.10:6642 192.168.100.13:34044 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.12:55480 192.168.100.13:2224 ESTABLISHED 1885/ruby tcp 0 0 192.168.100.12:2224 192.168.100.12:48362 ESTABLISHED 1885/ruby tcp 0 0 192.168.100.12:2224 192.168.100.13:43488 ESTABLISHED 1885/ruby tcp 0 0 192.168.100.10:6641 192.168.100.13:34148 ESTABLISHED 3493/ovsdb-server tcp 0 0 192.168.100.10:6642 192.168.100.79:47974 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.10:6641 192.168.100.11:60226 ESTABLISHED 3493/ovsdb-server tcp 0 0 192.168.100.10:6642 192.168.100.76:55570 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.10:6642 192.168.100.78:36428 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.10:6642 192.168.100.11:40682 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.10:6642 192.168.100.77:58772 ESTABLISHED 3503/ovsdb-server tcp 0 0 192.168.100.12:41974 192.168.100.11:2224 ESTABLISHED 1885/ruby tcp6 0 0 :::2224 :::* LISTEN 1885/ruby Regards, Stephen Flynn
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
