Thanks Numan, in my environment, it's worse, it's even not getting started and the monitor is only called once other than repeatedly for both master/slave or none, do you know if any problem could cause pacemaker have this decision? other resource are good.
On Fri, Dec 1, 2017 at 2:08 AM, Numan Siddique <[email protected]> wrote: > Hi HuiXiang, > Even I am seeing the issue where no node is promoted as master. I will > test more, fix and and submit patch set v3. > > Thanks > Numan > > > On Thu, Nov 30, 2017 at 4:10 PM, Numan Siddique <[email protected]> > wrote: > >> >> >> On Thu, Nov 30, 2017 at 1:15 PM, Hui Xiang <[email protected]> wrote: >> >>> Hi Numan, >>> >>> Thanks for helping, I am following your pcs example, but still with no >>> lucky, >>> >>> 1. Before running any configuration, I stopped all of the ovsdb-server >>> for OVN, and ovn-northd. Deleted ovnnb_active.conf/ovnsb_active.conf. >>> >>> 2. Since I have already had an vip in the cluster, so I chose to use it, >>> it's status is OK. >>> [root@node-1 ~]# pcs resource show >>> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >>> >>> 3. Use pcs to create ovndb-servers and constraint >>> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers >>> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 >>> sb_master_port=6642 master >>> ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true >>> Error: unable to find a resource/clone/master/group: >>> tst-ovndb-master) ## returned error, so I changed into below command. >>> >> >> Hi HuiXiang, >> This command is very important. Without which, pacemaker do not notify >> the status change and ovsdb-servers would not be promoted or demoted. >> Hence you don't see the notify action getting called in ovn ocf script. >> >> Can you try with the other command which I shared in my previous email. >> These commands work fine for me. >> >> Let me know how it goes. >> >> Thanks >> Numan >> >> >> [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb >>> notify=true >>> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master >>> with vip__management_old >>> >>> 4. pcs status >>> [root@node-1 ~]# pcs status >>> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >>> Master/Slave Set: tst-ovndb-master [tst-ovndb] >>> Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] >>> >>> 5. pcs resource show XXX >>> [root@node-1 ~]# pcs resource show vip__management_old >>> Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) >>> Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m >>> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none >>> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false >>> iptables_comment=default-comment >>> Meta Attrs: migration-threshold=3 failure-timeout=60 >>> resource-stickiness=1 >>> Operations: monitor interval=3 timeout=30 >>> (vip__management_old-monitor-3) >>> start interval=0 timeout=30 (vip__management_old-start-0) >>> stop interval=0 timeout=30 (vip__management_old-stop-0) >>> [root@node-1 ~]# pcs resource show tst-ovndb-master >>> Master: tst-ovndb-master >>> Meta Attrs: notify=true >>> Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) >>> Attributes: manage_northd=yes master_ip=192.168.0.2 >>> nb_master_port=6641 sb_master_port=6642 >>> Operations: start interval=0s timeout=30s >>> (tst-ovndb-start-timeout-30s) >>> stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) >>> promote interval=0s timeout=50s >>> (tst-ovndb-promote-timeout-50s) >>> demote interval=0s timeout=50s >>> (tst-ovndb-demote-timeout-50s) >>> monitor interval=30s timeout=20s >>> (tst-ovndb-monitor-interval-30s) >>> monitor interval=10s role=Master timeout=20s >>> (tst-ovndb-monitor-interval-10s-role-Master) >>> monitor interval=30s role=Slave timeout=20s >>> (tst-ovndb-monitor-interval-30s-role-Slave) >>> >>> >>> 6. I have put log in every ovndb-servers op, seems only the monitor op >>> is being called, no promoted by the pacemaker DC: >>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> ovsdb_server_monitor >>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> ovsdb_server_check_status >>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> return OCFOCF_NOT_RUNNINGG >>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> ovsdb_server_master_update: 7} >>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> ovsdb_server_master_update end} >>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>> monitor is going to return 7 >>> <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: metadata >>> exit OCF_SUCCESS} >>> >>> >>> Please take a look, thank you very much. >>> Hui. >>> >>> >>> >>> >>> On Wed, Nov 29, 2017 at 11:03 PM, Numan Siddique <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Wed, Nov 29, 2017 at 4:16 PM, Hui Xiang <[email protected]> wrote: >>>> >>>>> FYI, If I have configured a good ovndb-server cluster with one active >>>>> two slaves, then start pacemaker ovn-servers resource agents, they are all >>>>> becoming slaves... >>>>> >>>> >>>> You don't need to start ovndb-servers. When you create pacemaker >>>> resources it would automatically start them and promote on of them. >>>> >>>> One thing which is very important is to create an IPaddr2 resource >>>> before and add a colocation constraint so that pacemaker would promote the >>>> ovsdb-server in the node >>>> where IPaddr2 resource is running. This IPaddr2 resource ip should be >>>> your master ip. >>>> >>>> Can you please do "pcs resource show <name_of_the_resource>" and share >>>> the output ? >>>> >>>> Below is how I normally use for my testing. >>>> >>>> ############ >>>> pcs cluster cib tmp-cib.xml >>>> cp tmp-cib.xml tmp-cib.xml.deltasrc >>>> >>>> pcs -f tmp-cib.xml resource create tst-ovndb ocf:ovn:ovndb-servers >>>> manage_northd=yes master_ip=192.168.24.10 nb_master_port=6641 >>>> sb_master_port=6642 master >>>> pcs -f tmp-cib.xml resource meta tst-ovndb-master notify=true >>>> pcs -f tmp-cib.xml constraint colocation add master tst-ovndb-master >>>> with ip-192.168.24.10 >>>> >>>> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc >>>> pcs status >>>> ############## >>>> >>>> In the above example, "ip-192.168.24.10" is the IPaddr2 resource. >>>> >>>> Thanks >>>> Numan >>>> >>>> >>>> >>>> >>>>> >>>>> On Tue, Nov 28, 2017 at 10:48 PM, Numan Siddique <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Nov 28, 2017 at 2:29 PM, Hui Xiang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Numan, >>>>>>> >>>>>>> >>>>>>> Finally figure it out what's wrong when running ovndb-servers ocf in >>>>>>> my environment. >>>>>>> >>>>>>> 1. There is no default ovnnb and ovnsb running in my environment, I >>>>>>> thought it should be started by pacemaker as the usual way other typical >>>>>>> resource agent do it. >>>>>>> when I create the ovndb_servers resource, nothing happened, no >>>>>>> operation is executed except monitor, which is really hard to debug for >>>>>>> a >>>>>>> while. >>>>>>> In the ovsdb_server_monitor() function, first it will check the >>>>>>> status, here, it will be return NOT_RUNNING, then in >>>>>>> the ovsdb_server_master_update() function, "CRM_MASTER -D" is being >>>>>>> executed, which appears stopped every following action, I am not very >>>>>>> clear >>>>>>> what work it did. >>>>>>> >>>>>>> So, do the ovn_nb and ovn_sb needs to be running previouly before >>>>>>> pacemaker ovndb_servers resource create? Is there any such documentation >>>>>>> referred? >>>>>>> >>>>>> No they don't need to be. >>>> >>>> >>>>> >>>>>>> 2. Without your patch every nodes executing ovsdb_server_monitor and >>>>>>> return OCF_SUCCESS >>>>>>> However, the first node of the three nodes cluster is executed >>>>>>> ovsdb_server_stop action, the reason showed below: >>>>>>> <27>Nov 28 15:35:11 node-1 pengine[1897010]: error: clone_color: >>>>>>> ovndb_servers:0 is running on node-1.domain.tld which isn't allowed >>>>>>> Did I miss anything? I don't understand why it isn't allowed. >>>>>>> >>>>>>> 3. Regard your patch[1] >>>>>>> It first reports "/usr/lib/ocf/resource.d/ovn/ovndb-servers: line >>>>>>> 26: ocf_attribute_target: command not found ]" in my >>>>>>> environment(pacemaker >>>>>>> 1.1.12) >>>>>>> >>>>>> >>>>>> Thanks. I will come back to you on your other points. The function >>>>>> "ocf_attribute_target" action must be added in 1.1.16-12. >>>>>> >>>>>> I think it makes sense to either remove "ocf_attribute_target" or >>>>>> find a way so that even older versions work. >>>>>> >>>>>> I will spin a v2. >>>>>> Thanks >>>>>> Numan >>>>>> >>>>>> >>>>>> >>>>>> The log showed same as item2, but I have seen very shortly different >>>>>>> state from "pcs status" as below shown: >>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>> Slaves: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld >>>>>>> ] >>>>>>> There is no promote action being executed. >>>>>>> >>>>>>> >>>>>>> Thanks for looking and help. >>>>>>> >>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Nov 24, 2017 at 10:54 PM, Numan Siddique < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Hui Xiang, >>>>>>>> >>>>>>>> Can you please try with this patch [1] and see if it works for you >>>>>>>> ? Please let me know how it goes. But I am not sure, if the patch >>>>>>>> would fix >>>>>>>> the issue. >>>>>>>> >>>>>>>> To brief, the OVN OCF script doesn't add monitor action for >>>>>>>> "Master" role. So pacemaker Resource agent would not check for the >>>>>>>> status >>>>>>>> of ovn db servers periodically. In case ovn db servers are killed, >>>>>>>> pacemaker wont know about it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> You can also take a look at this [1] to know how it is used in >>>>>>>> openstack with tripleo installation. >>>>>>>> >>>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>>>> [2] - https://github.com/openstack/puppet-tripleo/blob/master/ma >>>>>>>> nifests/profile/pacemaker/ovn_northd.pp >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> Numan >>>>>>>> >>>>>>>> On Fri, Nov 24, 2017 at 3:00 PM, Hui Xiang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> I am following what suggested on doc[1] to configure the >>>>>>>>> ovndb_servers HA, however, it's so unluck with upgrading pacemaker >>>>>>>>> packages >>>>>>>>> from 1.12 to 1.16, do almost every kind of changes, there still not a >>>>>>>>> ovndb_servers master promoted, is there any special recipe for it to >>>>>>>>> run? >>>>>>>>> so frustrated on it, sigh. >>>>>>>>> >>>>>>>>> It always showed: >>>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>>> node-3.domain.tld ] >>>>>>>>> >>>>>>>>> Even if I tried below steps: >>>>>>>>> 1. pcs resource debug-stop ovndb_server on every nodes. >>>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>>> 2. pcs resource debug-start ovndb_server on every nodes. >>>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>>> 3. pcs resource debug-promote ovndb_server on one nodes. ovn-ctl >>>>>>>>> status_ovnxb: running/active >>>>>>>>> >>>>>>>>> With above status, the pcs status still showed as: >>>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>>> node-3.domain.tld ] >>>>>>>>> >>>>>>>>> >>>>>>>>> [1]. https://github.com/openvswitch/ovs/blob/master/Document >>>>>>>>> ation/topics/integration.rst >>>>>>>>> >>>>>>>>> Appreciated any hint. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> discuss mailing list >>>>>>>>> [email protected] >>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
