Hi HuiXiang, Even I am seeing the issue where no node is promoted as master. I will test more, fix and and submit patch set v3.
Thanks Numan On Thu, Nov 30, 2017 at 4:10 PM, Numan Siddique <[email protected]> wrote: > > > On Thu, Nov 30, 2017 at 1:15 PM, Hui Xiang <[email protected]> wrote: > >> Hi Numan, >> >> Thanks for helping, I am following your pcs example, but still with no >> lucky, >> >> 1. Before running any configuration, I stopped all of the ovsdb-server >> for OVN, and ovn-northd. Deleted ovnnb_active.conf/ovnsb_active.conf. >> >> 2. Since I have already had an vip in the cluster, so I chose to use it, >> it's status is OK. >> [root@node-1 ~]# pcs resource show >> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >> >> 3. Use pcs to create ovndb-servers and constraint >> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers >> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 >> sb_master_port=6642 master >> ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true >> Error: unable to find a resource/clone/master/group: >> tst-ovndb-master) ## returned error, so I changed into below command. >> > > Hi HuiXiang, > This command is very important. Without which, pacemaker do not notify the > status change and ovsdb-servers would not be promoted or demoted. > Hence you don't see the notify action getting called in ovn ocf script. > > Can you try with the other command which I shared in my previous email. > These commands work fine for me. > > Let me know how it goes. > > Thanks > Numan > > > [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb >> notify=true >> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master >> with vip__management_old >> >> 4. pcs status >> [root@node-1 ~]# pcs status >> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >> Master/Slave Set: tst-ovndb-master [tst-ovndb] >> Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] >> >> 5. pcs resource show XXX >> [root@node-1 ~]# pcs resource show vip__management_old >> Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) >> Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m >> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none >> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false >> iptables_comment=default-comment >> Meta Attrs: migration-threshold=3 failure-timeout=60 >> resource-stickiness=1 >> Operations: monitor interval=3 timeout=30 (vip__management_old-monitor-3 >> ) >> start interval=0 timeout=30 (vip__management_old-start-0) >> stop interval=0 timeout=30 (vip__management_old-stop-0) >> [root@node-1 ~]# pcs resource show tst-ovndb-master >> Master: tst-ovndb-master >> Meta Attrs: notify=true >> Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) >> Attributes: manage_northd=yes master_ip=192.168.0.2 >> nb_master_port=6641 sb_master_port=6642 >> Operations: start interval=0s timeout=30s (tst-ovndb-start-timeout-30s) >> stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) >> promote interval=0s timeout=50s >> (tst-ovndb-promote-timeout-50s) >> demote interval=0s timeout=50s >> (tst-ovndb-demote-timeout-50s) >> monitor interval=30s timeout=20s >> (tst-ovndb-monitor-interval-30s) >> monitor interval=10s role=Master timeout=20s >> (tst-ovndb-monitor-interval-10s-role-Master) >> monitor interval=30s role=Slave timeout=20s >> (tst-ovndb-monitor-interval-30s-role-Slave) >> >> >> 6. I have put log in every ovndb-servers op, seems only the monitor op is >> being called, no promoted by the pacemaker DC: >> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> ovsdb_server_monitor >> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> ovsdb_server_check_status >> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> return OCFOCF_NOT_RUNNINGG >> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> ovsdb_server_master_update: 7} >> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> ovsdb_server_master_update end} >> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >> monitor is going to return 7 >> <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata >> exit OCF_SUCCESS} >> >> >> Please take a look, thank you very much. >> Hui. >> >> >> >> >> On Wed, Nov 29, 2017 at 11:03 PM, Numan Siddique <[email protected]> >> wrote: >> >>> >>> >>> On Wed, Nov 29, 2017 at 4:16 PM, Hui Xiang <[email protected]> wrote: >>> >>>> FYI, If I have configured a good ovndb-server cluster with one active >>>> two slaves, then start pacemaker ovn-servers resource agents, they are all >>>> becoming slaves... >>>> >>> >>> You don't need to start ovndb-servers. When you create pacemaker >>> resources it would automatically start them and promote on of them. >>> >>> One thing which is very important is to create an IPaddr2 resource >>> before and add a colocation constraint so that pacemaker would promote the >>> ovsdb-server in the node >>> where IPaddr2 resource is running. This IPaddr2 resource ip should be >>> your master ip. >>> >>> Can you please do "pcs resource show <name_of_the_resource>" and share >>> the output ? >>> >>> Below is how I normally use for my testing. >>> >>> ############ >>> pcs cluster cib tmp-cib.xml >>> cp tmp-cib.xml tmp-cib.xml.deltasrc >>> >>> pcs -f tmp-cib.xml resource create tst-ovndb ocf:ovn:ovndb-servers >>> manage_northd=yes master_ip=192.168.24.10 nb_master_port=6641 >>> sb_master_port=6642 master >>> pcs -f tmp-cib.xml resource meta tst-ovndb-master notify=true >>> pcs -f tmp-cib.xml constraint colocation add master tst-ovndb-master >>> with ip-192.168.24.10 >>> >>> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc >>> pcs status >>> ############## >>> >>> In the above example, "ip-192.168.24.10" is the IPaddr2 resource. >>> >>> Thanks >>> Numan >>> >>> >>> >>> >>>> >>>> On Tue, Nov 28, 2017 at 10:48 PM, Numan Siddique <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Nov 28, 2017 at 2:29 PM, Hui Xiang <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Numan, >>>>>> >>>>>> >>>>>> Finally figure it out what's wrong when running ovndb-servers ocf in >>>>>> my environment. >>>>>> >>>>>> 1. There is no default ovnnb and ovnsb running in my environment, I >>>>>> thought it should be started by pacemaker as the usual way other typical >>>>>> resource agent do it. >>>>>> when I create the ovndb_servers resource, nothing happened, no >>>>>> operation is executed except monitor, which is really hard to debug for a >>>>>> while. >>>>>> In the ovsdb_server_monitor() function, first it will check the >>>>>> status, here, it will be return NOT_RUNNING, then in >>>>>> the ovsdb_server_master_update() function, "CRM_MASTER -D" is being >>>>>> executed, which appears stopped every following action, I am not very >>>>>> clear >>>>>> what work it did. >>>>>> >>>>>> So, do the ovn_nb and ovn_sb needs to be running previouly before >>>>>> pacemaker ovndb_servers resource create? Is there any such documentation >>>>>> referred? >>>>>> >>>>> No they don't need to be. >>> >>> >>>> >>>>>> 2. Without your patch every nodes executing ovsdb_server_monitor and >>>>>> return OCF_SUCCESS >>>>>> However, the first node of the three nodes cluster is executed >>>>>> ovsdb_server_stop action, the reason showed below: >>>>>> <27>Nov 28 15:35:11 node-1 pengine[1897010]: error: clone_color: >>>>>> ovndb_servers:0 is running on node-1.domain.tld which isn't allowed >>>>>> Did I miss anything? I don't understand why it isn't allowed. >>>>>> >>>>>> 3. Regard your patch[1] >>>>>> It first reports "/usr/lib/ocf/resource.d/ovn/ovndb-servers: line >>>>>> 26: ocf_attribute_target: command not found ]" in my >>>>>> environment(pacemaker >>>>>> 1.1.12) >>>>>> >>>>> >>>>> Thanks. I will come back to you on your other points. The function >>>>> "ocf_attribute_target" action must be added in 1.1.16-12. >>>>> >>>>> I think it makes sense to either remove "ocf_attribute_target" or find >>>>> a way so that even older versions work. >>>>> >>>>> I will spin a v2. >>>>> Thanks >>>>> Numan >>>>> >>>>> >>>>> >>>>> The log showed same as item2, but I have seen very shortly different >>>>>> state from "pcs status" as below shown: >>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>> Slaves: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] >>>>>> There is no promote action being executed. >>>>>> >>>>>> >>>>>> Thanks for looking and help. >>>>>> >>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 24, 2017 at 10:54 PM, Numan Siddique <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Hi Hui Xiang, >>>>>>> >>>>>>> Can you please try with this patch [1] and see if it works for you >>>>>>> ? Please let me know how it goes. But I am not sure, if the patch would >>>>>>> fix >>>>>>> the issue. >>>>>>> >>>>>>> To brief, the OVN OCF script doesn't add monitor action for "Master" >>>>>>> role. So pacemaker Resource agent would not check for the status of ovn >>>>>>> db >>>>>>> servers periodically. In case ovn db servers are killed, pacemaker wont >>>>>>> know about it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> You can also take a look at this [1] to know how it is used in >>>>>>> openstack with tripleo installation. >>>>>>> >>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>>> [2] - https://github.com/openstack/puppet-tripleo/blob/master/ma >>>>>>> nifests/profile/pacemaker/ovn_northd.pp >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Numan >>>>>>> >>>>>>> On Fri, Nov 24, 2017 at 3:00 PM, Hui Xiang <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> I am following what suggested on doc[1] to configure the >>>>>>>> ovndb_servers HA, however, it's so unluck with upgrading pacemaker >>>>>>>> packages >>>>>>>> from 1.12 to 1.16, do almost every kind of changes, there still not a >>>>>>>> ovndb_servers master promoted, is there any special recipe for it to >>>>>>>> run? >>>>>>>> so frustrated on it, sigh. >>>>>>>> >>>>>>>> It always showed: >>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>> node-3.domain.tld ] >>>>>>>> >>>>>>>> Even if I tried below steps: >>>>>>>> 1. pcs resource debug-stop ovndb_server on every nodes. >>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>> 2. pcs resource debug-start ovndb_server on every nodes. >>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>> 3. pcs resource debug-promote ovndb_server on one nodes. ovn-ctl >>>>>>>> status_ovnxb: running/active >>>>>>>> >>>>>>>> With above status, the pcs status still showed as: >>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>> node-3.domain.tld ] >>>>>>>> >>>>>>>> >>>>>>>> [1]. https://github.com/openvswitch/ovs/blob/master/Document >>>>>>>> ation/topics/integration.rst >>>>>>>> >>>>>>>> Appreciated any hint. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> discuss mailing list >>>>>>>> [email protected] >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
