Hi Stephen, Your setup seems fine to me. Please see below for some comments.
On Wed, May 1, 2019 at 8:23 PM Stephen Flynn via discuss < [email protected]> wrote: > Greetings OVS Discuss Group: > > > > First – I want to apologize for the wall of text to follow – I wasn’t sure > how much information would be wanted and I didn’t want everyone to have to > ask 100 questions to get what they needed. > > > > I am working on settings up an OVN Controller (3 Nodes) using Pacemaker / > Corosync. For reference -- A single node controller (no pacemaker / > corosync) operates without issue. > > Root Goal: Failover Redundancy for the OVN Controller – Allows for > maintenance of controller nodes and/or failure of a controller node. > Just a small correction - Pacemker is used to provide active/passive HA for the OVN DB servers. When you refer OVN Controller its a bit confusing since we have a service called - ovn-controller (provided by ovn-host package) which runs on each host/hypervisor. > > I have been following the very limited documentation on how to setup this > environment but don’t seem to be having much luck getting it to be 100% > stable or operational. > > > http://docs.openvswitch.org/en/latest/topics/integration/?highlight=pacemaker > > > > If I could ask someone to assist in reviewing the below installation and > provide some insight into what I may be doing wrong, or have wrong (need > newer version of code, etc) I would be grateful. I’ve currently only > attempted this by using “packaged” versions of code to “keep it simple” … > but I do realize that getting this to work may require a newer code release. > Along with this, once I am able to get a stable environment, I would like > to contribute updated documentation to the community on how to perform a > full setup. > > > > > > *-- Environment –* > > > > # cat /etc/lsb-release | grep DESCRIPTION > > DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS" > > > > # ovn-northd --version > > ovn-northd (Open vSwitch) 2.9.2 > > > > # ovn-nbctl --version > > ovn-nbctl (Open vSwitch) 2.9.2 > > DB Schema 5.10.0 > > > > # ovn-sbctl --version > > ovn-sbctl (Open vSwitch) 2.9.2 > > DB Schema 1.15.0 > > > > *-- Setup Steps –* > > # cat /etc/hosts > > > > # LAB - Compute Nodes > > 192.168.100.10 ctrl00 > > 192.168.100.11 ctrl01 > > 192.168.100.12 ctrl02 > > 192.168.100.13 ctrl03 > > 192.168.100.76 cn01 > > 192.168.100.77 cn02 > > 192.168.100.78 cn03 > > 192.168.100.79 cn04 > > > > > > ### All Controllers > > > > ## System Package Updates > > apt clean all; apt update; apt -y dist-upgrade > > > > ## Time Sync Services (NTP) [ctrl01, ctrl02, ctrl03] > > apt install -y ntp > > > > ## Install OVN Central Controller [ctrl01, ctrl02, ctrl03] > > apt install -y openvswitch-common openvswitch-switch python-openvswitch > python3-openvswitch ovn-common ovn-central > > > > ## Install pacemaker and corosync [ctrl01, ctrl02, ctrl03] > > apt install -y pcs pacemaker pacemaker-cli-utils > > > > ## Reset the Pacemaker Password [ctrl01, ctrl02, ctrl03] > > Password = 6B43WAmuPzM2Ewsr > > > > enc_passwd=$(python3 -c 'import crypt; > print(crypt.crypt("6B43WAmuPzM2Ewsr", crypt.mksalt(crypt.METHOD_SHA512)))') > > usermod -p "${enc_passwd}" hacluster > > > > ## Enable and Start the PCS Daemon [ctrl01, ctrl02, ctrl03] > > sudo systemctl enable pcsd; > > sudo systemctl start pcsd; > > sudo systemctl status pcsd; > > ``` > > > > ### Cluster Controller #1 (ONLY) > > > > ## Enable Cluster Services [ctrl01 --ONLY-- ] > > > > pcs cluster auth ctrl01 ctrl02 ctrl03 -u hacluster -p '6B43WAmuPzM2Ewsr' > --force > > pcs cluster setup --name OVN-CLUSTER ctrl01 ctrl02 ctrl03 --force > > > > pcs cluster enable --all; > > pcs cluster start --all; > > > > pcs property set stonith-enabled=false > > pcs property set no-quorum-policy=ignore > > > > pcs status > > > > [[ output ]] > > Cluster name: OVN-CLUSTER > > Stack: corosync > > Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum > > Last updated: Wed May 1 09:42:33 2019 > > Last change: Wed May 1 09:42:29 2019 by root via cibadmin on ctrl01 > > > > 3 nodes configured > > 0 resources configured > > > > Online: [ ctrl01 ctrl02 ctrl03 ] > > > > No resources > > > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > > > > > ## Add cluster resources [node01 --ONLY-- ] > > pcs resource create ovn-virtual-ip ocf:heartbeat:IPaddr2 nic=ens192 > ip=192.168.100.10 cidr_netmask=24 op monitor interval=30s > > > > pcs resource create ovndb_servers ocf:ovn:ovndb-servers \ > > master_ip=192.168.100.10 \ > > ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl \ > > op monitor interval="10s" \ > > op monitor role=Master interval="15s" > > > > pcs resource master ovndb_servers-master ovndb_servers \ > > meta notify="true" > > > > pcs constraint order promote ovndb_servers-master then ovn-virtual-ip > > pcs constraint colocation add ovn-virtual-ip with master > ovndb_servers-master score=INFINITY > > > > pcs status > > > > [[ output ]] > > Cluster name: OVN-CLUSTER > > Stack: corosync > > Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum > > Last updated: Wed May 1 09:46:02 2019 > > Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02 > > > > 3 nodes configured > > 4 resources configured > > > > Online: [ ctrl01 ctrl02 ctrl03 ] > > > > Full list of resources: > > > > ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02 > > Master/Slave Set: ovndb_servers-master [ovndb_servers] > > Masters: [ ctrl02 ] > > Slaves: [ ctrl01 ctrl03 ] > > > > Failed Actions: > > * ovndb_servers_monitor_10000 on ctrl01 'master' (8): call=18, > status=complete, exitreason='', > > last-rc-change='Wed May 1 09:43:28 2019', queued=0ms, exec=73ms > > > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > > > ///// > > > > At this point, I execute ‘pcs cluster stop {{ controller }}; pcs cluster > start {{ controller }}’ for each controller one at a time and eventually > everything clears up. > > > > # pcs status > > Cluster name: OVN-CLUSTER > > Stack: corosync > > Current DC: ctrl02 (version 1.1.18-2b07d5c5a9) - partition with quorum > > Last updated: Wed May 1 09:47:50 2019 > > Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02 > > > > 3 nodes configured > > 4 resources configured > > > > Online: [ ctrl01 ctrl02 ctrl03 ] > > > > Full list of resources: > > > > ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02 > > Master/Slave Set: ovndb_servers-master [ovndb_servers] > > Masters: [ ctrl02 ] > > Slaves: [ ctrl01 ctrl03 ] > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > > > ///// > > > > lab-kvmctrl-01:~# ovn-sbctl show > > Chassis "3fd25b76-3170-4eab-8604-690182500478" > > hostname: "lab-vxlan-cn03" > > Encap geneve > > ip: "192.168.100.78" > > options: {csum="true"} > > Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3" > > hostname: "lab-vxlan-cn01" > > Encap geneve > > ip: "192.168.100.76" > > options: {csum="true"} > > Chassis "bda632c5-afeb-41bd-80e5-5c423172a771" > > hostname: "lab-vxlan-cn02" > > Encap geneve > > ip: "192.168.100.77" > > options: {csum="true"} > > Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c" > > hostname: "lab-vxlan-cn04" > > Encap geneve > > ip: "192.168.100.79" > > options: {csum="true"} > > > > ///// > > > > Now I turn up a VM on CN01 and it connects to ‘br-int’ for one of the > interfaces. > > OVN_NB has the logical switch configured, and the port assignment. > > OVN_SB never receives the port state ‘online’ from the CN. > > > > # ovs-vsctl show > > aca349bd-6a23-47f6-98da-a35773753858 > > Bridge br-int > > fail_mode: secure > > Port br-int > > Interface br-int > > type: internal > > Port "ovn-bda632-0" > > Interface "ovn-bda632-0" > > type: geneve > > options: {csum="true", key=flow, > remote_ip="192.168.100.77"} > > Port "ovn-3fd25b-0" > > Interface "ovn-3fd25b-0" > > type: geneve > > options: {csum="true", key=flow, > remote_ip="192.168.100.78"} > > Port "ovn-5bd199-0" > > Interface "ovn-5bd199-0" > > type: geneve > > options: {csum="true", key=flow, > remote_ip="192.168.100.79"} > > Port "525401c1d4e1" <<<<<<<<< VM PORT > > Interface "525401c1d4e1" <<<<<<<<< VM PORT > > > > ///// > > VM DOMXML – INTERFACE > > ///// > > <interface type='bridge'> > > <mac address='52:54:01:c1:d4:e1'/> > > <source bridge='br-int'/> > > <virtualport type='openvswitch'> > > <parameters interfaceid='9de72cf5-cbb2-4ebe-9c89-3962a29ed869'/> > > </virtualport> > > <target dev='525401c1d4e1'/> > > <model type='virtio'/> > > <alias name='net1'/> > > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > > </interface> > > > > > > # ovn-nbctl show > > switch 9f87e014-4b3d-40e4-9a42-9f0b28957c05 (ls_1234) > > port 9de72cf5-cbb2-4ebe-9c89-3962a29ed869 > > addresses: ["52:54:01:c1:d4:e1"] > > > > # ovn-sbctl show > > Chassis "3fd25b76-3170-4eab-8604-690182500478" > > hostname: "lab-vxlan-cn03" > > Encap geneve > > ip: "192.168.100.78" > > options: {csum="true"} > > Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c" > > hostname: "lab-vxlan-cn04" > > Encap geneve > > ip: "192.168.100.79" > > options: {csum="true"} > > Chassis "bda632c5-afeb-41bd-80e5-5c423172a771" > > hostname: "lab-vxlan-cn02" > > Encap geneve > > ip: "192.168.100.77" > > options: {csum="true"} > > Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3" > > hostname: "lab-vxlan-cn01" > > Encap geneve > > ip: "192.168.100.76" > > options: {csum="true"} > > > > lab-vxlan-cn01# ovs list open_vswitch | grep external_ids > > external_ids : {hostname="lab-vxlan-cn01", > ovn-encap-ip="192.168.100.76", ovn-encap-type=geneve, ovn-nb="tcp: > 192.168.100.10:6641", ovn-remote="tcp:192.168.100.10:6642", > rundir="/var/run/openvswitch", > system-id="0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"} > > > > > If you are familiar with puppet, you can refer to this [1] which creates the ocf:ovn:ovndb-servers resource. But the steps you provided above to setup the pacemaker cluster seems fine to me. [1] - https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/pacemaker/ovn_northd.pp You can do few things to check if your setup is fine or not 1. Check that ovn-controllers on nodes - cn[01-04] are able to communicate to the OVN Southbound DB. On the master node you can delete the chassis like "ovn-sbctl chassis-del 3fd25b76-3170-4eab-8604-690182500478" and then run "ovn-sbctl show". If the chassis record for cn03 reappears then its fine. 2. Run - ovn-nbctl --db=tcp:192.168.100.10:6641 show to make sure you are able to talk to the OVN DB servers. 3. See the ovn-controller.log on the CN01 and see if it has claimed the port or not. In the logs you should see "Claiming port ...." Thanks Numan ///// > > Netstat output from “master” controller > > ///// > > > > # netstat -antp | grep -v sshd | grep -v WAIT > > Active Internet connections (servers and established) > > Proto Recv-Q Send-Q Local Address Foreign Address > State PID/Program name > > tcp 0 0 0.0.0.0:2224 0.0.0.0:* > LISTEN 1885/ruby > > tcp 0 0 192.168.100.10:6641 0.0.0.0:* > LISTEN 3493/ovsdb-server > > tcp 0 0 192.168.100.10:6642 0.0.0.0:* > LISTEN 3503/ovsdb-server > > tcp 0 0 192.168.100.12:48362 192.168.100.12:2224 > ESTABLISHED 1885/ruby > > tcp 0 0 192.168.100.12:2224 192.168.100.11:58166 > ESTABLISHED 1885/ruby > > tcp 0 0 192.168.100.10:6642 192.168.100.13:34044 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.12:55480 192.168.100.13:2224 > ESTABLISHED 1885/ruby > > tcp 0 0 192.168.100.12:2224 192.168.100.12:48362 > ESTABLISHED 1885/ruby > > tcp 0 0 192.168.100.12:2224 192.168.100.13:43488 > ESTABLISHED 1885/ruby > > tcp 0 0 192.168.100.10:6641 192.168.100.13:34148 > ESTABLISHED 3493/ovsdb-server > > tcp 0 0 192.168.100.10:6642 192.168.100.79:47974 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.10:6641 192.168.100.11:60226 > ESTABLISHED 3493/ovsdb-server > > tcp 0 0 192.168.100.10:6642 192.168.100.76:55570 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.10:6642 192.168.100.78:36428 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.10:6642 192.168.100.11:40682 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.10:6642 192.168.100.77:58772 > ESTABLISHED 3503/ovsdb-server > > tcp 0 0 192.168.100.12:41974 192.168.100.11:2224 > ESTABLISHED 1885/ruby > > tcp6 0 0 :::2224 :::* > LISTEN 1885/ruby > > > > > > > > Regards, > > > > *Stephen Flynn* > > > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
