Re: [ClusterLabs] state file not created for Stateful resource agent
ful puts its state file directly in /var/run to > avoid needing to create any directories. You can change that by setting > the "state" parameter, but in that case you have to make sure the > directory you specify exists beforehand. > > > This issue is not observed in case secondary do not wait for? cib > > sync and starts the resource on secondary as well. > > > > We are in process of upgrading from centos6 to centos7, We never > > observed this issue with centos6 releases. > > > > Attributes for clone resource:?master-max=1 master-node-max=1 clone- > > max=2 clone-node-max=1? > > > > setup under observation is: > > > > CentOS Linux release 7.4.1708 (Core) > > corosync-2.4.0-9.el7.x86_64 > > pacemaker-1.1.16-12.el7.x86_64. > > > > > > Thanks and Regards, > > Ashutosh? > > > > ___ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > > pdf > > Bugs: http://bugs.clusterlabs.org > -- > Ken Gaillot > > > -- > > Message: 2 > Date: Tue, 20 Mar 2018 22:03:18 +0300 > From: George Melikov > To: Cluster Labs - All topics related to open-source clustering > welcomed > Subject: [ClusterLabs] symmetric-cluster=false doesn't work > Message-ID: <260041521572...@web47j.yandex.ru> > Content-Type: text/plain; charset="us-ascii" > > An HTML attachment was scrubbed... > URL: <https://lists.clusterlabs.org/pipermail/users/ > attachments/20180320/a4e12349/attachment-0001.html> > > -- > > Message: 3 > Date: Tue, 20 Mar 2018 21:18:29 +0100 > From: Jehan-Guillaume de Rorthais > To: Cluster Labs - All topics related to open-source clustering > welcomed > Subject: Re: [ClusterLabs] state file not created for Stateful > resource agent > Message-ID: <20180320211829.46cfc4c5@firost> > Content-Type: text/plain; charset=UTF-8 > > On Tue, 20 Mar 2018 13:00:49 -0500 > Ken Gaillot wrote: > > > On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote: > > > Hi, > > > > > > > > > We have two node active/standby cluster with a dummy? Stateful > > > resource (pacemaker/Stateful). > > > > > > We observed that in case one node is up with master resource and > > > other node is booted up, state file for dummy resource is not created > > > on the node coming up. > > > > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc > > > e_attributes[@id='status-2']:? > > name="master-unicloud" value="5"/> > > > Mar 17 12:22:29 [24875] tigana? ? ? ?lrmd:? ?notice: > > > operation_finished:? ? ? ? unicloud_start_0:25729:stderr [ > > > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such > > > file or directory ] > > > > The resource agent is ocf:pw:uc -- I assume this is a local > > customization of the ocf:pacemaker:Stateful agent? > > > > It looks to me like the /var/run/uc directory is not being created on > > the second node. /var/run is a memory filesystem, so it's wiped at > > every reboot, and any directories need to be created (as root) before > > they are used, every boot. > > > > ocf:pacemaker:Stateful puts its state file directly in /var/run to > > avoid needing to create any directories. You can change that by setting > > the "state" parameter, but in that case you have to make sure the > > directory you specify exists beforehand. > > Another way to create the folder at each boot is to ask systemd. > > Eg.: > > cat < /etc/tmpfiles.d/ocf-pw-uc.conf > # Directory for ocf:pw:uc resource agent > d /var/run/uc 0700 root root - - > EOF > > Adjust the rights and owner to suit your need. > > To take this file in consideration immediately without rebooting the > server, > run the following command: > > systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf > > Regards, > -- > Jehan-Guillaume de Rorthais > Dalibo > > > -- > > Message: 4 > Date: Tue, 20 Mar 2018 05:59:12 + > From: Roshni Chatterjee > To: "users@clusterlabs.org" > Subject: [ClusterLabs] Error observed while starting cluster > Message-ID: > jpnprd01.prod.outlook.com> > > Content
[ClusterLabs] Error observed while starting cluster
Hi , Error observed in pacemaker and pcs status Error: cluster is not currently running on this node !! I have built the source code of corosync (2.4.2) and pacemaker (1.1.16) and have followed the below steps for building a 2 node cluster . 1. Download source code of corosync and pacemaker (versions as mentioned above ) and compile . 2. Install pcsd using “yum install pcs” 3. Allow cluster services through firewall using #firewall-cmd --permanent --add-service=high-availability 4. Start and enable pcsd #systemctl start pcsd and #systemctl enable pcsd 5. Change password for user hacluster 6. pcs cluster auth pcmk3 node2 7. pcs cluster setup --name mycluster pcmk3 node2 8. pcs cluster start -all 9. pcs status It is observed that the no error is received till step 8 . At step 9 when pcs status is checked error is received (highlighted below) [root@node2 ~]# pacemakerd --features Pacemaker 1.1.16 (Build: 94ff4df51a) Supporting v3.0.11: agent-manpages libqb-logging libqb-ipc nagios corosync-native atomic-attrd acls [root@node2 ~]# pcs cluster start --all pcmk3: Starting Cluster... node2: Starting Cluster... [root@node2 ~]# pcs status Error: cluster is not currently running on this node On checking pacemaker status the following issue is found - [root@pcmk3 ~]# systemctl pacemaker status -l Unknown operation 'pacemaker'. [root@pcmk3 ~]# systemctl status pacemaker -l ● pacemaker.service - Pacemaker High Availability Cluster Manager Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2018-03-20 10:55:44 IST; 13min ago Docs: man:pacemakerd http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html Main PID: 26932 (pacemakerd) CGroup: /system.slice/pacemaker.service ├─26932 /usr/sbin/pacemakerd -f ├─26933 /usr/libexec/pacemaker/cib ├─26934 /usr/libexec/pacemaker/stonithd ├─26935 /usr/libexec/pacemaker/lrmd ├─26936 /usr/libexec/pacemaker/attrd └─26937 /usr/libexec/pacemaker/pengine Mar 20 10:55:45 pcmk3 pacemakerd[26932]: notice: Respawning failed child process: crmd Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27035) exited: Key has expired (127) Mar 20 10:55:45 pcmk3 pacemakerd[26932]: notice: Respawning failed child process: crmd Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27036) exited: Key has expired (127) Mar 20 10:55:45 pcmk3 pacemakerd[26932]: notice: Respawning failed child process: crmd Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27037) exited: Key has expired (127) Mar 20 10:55:45 pcmk3 pacemakerd[26932]: notice: Respawning failed child process: crmd Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27038) exited: Key has expired (127) Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: Child respawn count exceeded by crmd Mar 20 10:56:21 pcmk3 cib[26933]:error: Operation ignored, cluster configuration is invalid. Please repair and restart: Update does not conform to the configured schema [root@pcmk3 ~]# Corosync.log Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked child 27035 for process crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 20 10:55:45 [26932] pcmk3 pacemakerd:error: pcmk_child_exit:The crmd process (27035) exited: Key has expired (127) Mar 20 10:55:45 [26932] pcmk3 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Using uid=189 and group=189 for process crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked child 27036 for process crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver: Ignoring process list sent by peer for local node Mar 20 10:55:45 [26932] pcmk3 pacemakerd:error: pcmk_child_exit:The crmd process (27036) exited: Key has expired (127) Mar 20 10:55:45 [26932] pcmk3 pacemakerd: notice: pcmk_process_exit: Respawning failed child process: crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Using uid=189 and group=189 for process crmd Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked child 27037 for proc
Re: [ClusterLabs] state file not created for Stateful resource agent
On Tue, 20 Mar 2018 13:00:49 -0500 Ken Gaillot wrote: > On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote: > > Hi, > > > > > > We have two node active/standby cluster with a dummy Stateful > > resource (pacemaker/Stateful). > > > > We observed that in case one node is up with master resource and > > other node is booted up, state file for dummy resource is not created > > on the node coming up. > > > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc > > e_attributes[@id='status-2']: > name="master-unicloud" value="5"/> > > Mar 17 12:22:29 [24875] tigana lrmd: notice: > > operation_finished: unicloud_start_0:25729:stderr [ > > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such > > file or directory ] > > The resource agent is ocf:pw:uc -- I assume this is a local > customization of the ocf:pacemaker:Stateful agent? > > It looks to me like the /var/run/uc directory is not being created on > the second node. /var/run is a memory filesystem, so it's wiped at > every reboot, and any directories need to be created (as root) before > they are used, every boot. > > ocf:pacemaker:Stateful puts its state file directly in /var/run to > avoid needing to create any directories. You can change that by setting > the "state" parameter, but in that case you have to make sure the > directory you specify exists beforehand. Another way to create the folder at each boot is to ask systemd. Eg.: cat < /etc/tmpfiles.d/ocf-pw-uc.conf # Directory for ocf:pw:uc resource agent d /var/run/uc 0700 root root - - EOF Adjust the rights and owner to suit your need. To take this file in consideration immediately without rebooting the server, run the following command: systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf Regards, -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] symmetric-cluster=false doesn't work
Hello, I tried to create an asymmetric cluster via property symmetric-cluster=false , but my resources try to start on any node, though I have set locations for them. What did I miss? cib: https://pastebin.com/AhYqgUdw Thank you for any help!Sincerely,George Melikov ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] state file not created for Stateful resource agent
On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote: > Hi, > > > We have two node active/standby cluster with a dummy Stateful > resource (pacemaker/Stateful). > > We observed that in case one node is up with master resource and > other node is booted up, state file for dummy resource is not created > on the node coming up. > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc > e_attributes[@id='status-2']: name="master-unicloud" value="5"/> > Mar 17 12:22:29 [24875] tigana lrmd: notice: > operation_finished: unicloud_start_0:25729:stderr [ > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such > file or directory ] The resource agent is ocf:pw:uc -- I assume this is a local customization of the ocf:pacemaker:Stateful agent? It looks to me like the /var/run/uc directory is not being created on the second node. /var/run is a memory filesystem, so it's wiped at every reboot, and any directories need to be created (as root) before they are used, every boot. ocf:pacemaker:Stateful puts its state file directly in /var/run to avoid needing to create any directories. You can change that by setting the "state" parameter, but in that case you have to make sure the directory you specify exists beforehand. > This issue is not observed in case secondary do not wait for cib > sync and starts the resource on secondary as well. > > We are in process of upgrading from centos6 to centos7, We never > observed this issue with centos6 releases. > > Attributes for clone resource: master-max=1 master-node-max=1 clone- > max=2 clone-node-max=1 > > setup under observation is: > > CentOS Linux release 7.4.1708 (Core) > corosync-2.4.0-9.el7.x86_64 > pacemaker-1.1.16-12.el7.x86_64. > > > Thanks and Regards, > Ashutosh > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Colocation constraint for grouping all master-mode stateful resources with important stateless resources
Hi All - I've implemented a simple two-node cluster with DRBD and a couple of network-based Master/Slave resources. Using the ethmonitor RA, I set up failover whenever the Master/Primary node loses link on the specified ethernet physical device by constraining the Master role only on nodes where the ethmon variable is "1". Something is going wrong with my colocation constraint, however - if I set up the DRBDFS resource to monitor link on eth1, unplugging eth1 on the Primary node causes a failover as expected - all Master resources are demoted to "slave" and promoted on the opposite node, and the "normal" DRBDFS moves to the other node as expected. However, if I put the same ethmonitor constraint on the network-based Master/Slave resource, only that specific resource fails over - DRBDFS stays in the same location (though it stops) as do the other Master/Slave resources. This *smells* like a constraints issue to me - does anyone know what I might be doing wrong? PCS before: Cluster name: node1.hostname.com_node2.hostname.com Stack: corosync Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum Last updated: Tue Mar 20 16:25:47 2018 Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on node2.hostname.com_0 2 nodes configured 11 resources configured Online: [ node1.hostname.com_0 node2.hostname.com_0 ] Full list of resources: Master/Slave Set: drbd.master [drbd.slave] Masters: [ node1.hostname.com_0 ] Slaves: [ node2.hostname.com_0 ] drbdfs (ocf::heartbeat:Filesystem):Started node1.hostname.com_0 Master/Slave Set: inside-interface-sameip.master [inside-interface-sameip.slave] Masters: [ node1.hostname.com_0 ] Slaves: [ node2.hostname.com_0 ] Master/Slave Set: outside-interface-sameip.master [outside-interface-sameip.slave] Masters: [ node1.hostname.com_0 ] Slaves: [ node2.hostname.com_0 ] Clone Set: monitor-eth1-clone [monitor-eth1] Started: [ node1.hostname.com_0 node2.hostname.com_0 ] Clone Set: monitor-eth2-clone [monitor-eth2] Started: [ node1.hostname.com_0 node2.hostname.com_0 ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: inactive/disabled PCS after: Cluster name: node1.hostname.com_node2.hostname.com Stack: corosync Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum Last updated: Tue Mar 20 16:29:40 2018 Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on node2.hostname.com_0 2 nodes configured 11 resources configured Online: [ node1.hostname.com_0 node2.hostname.com_0 ] Full list of resources: Master/Slave Set: drbd.master [drbd.slave] Masters: [ node1.hostname.com_0 ] Slaves: [ node2.hostname.com_0 ] drbdfs (ocf::heartbeat:Filesystem):Stopped Master/Slave Set: inside-interface-sameip.master [inside-interface-sameip.slave] Masters: [ node2.hostname.com_0 ] Stopped: [ node1.hostname.com_0 ] Master/Slave Set: outside-interface-sameip.master [outside-interface-sameip.slave] Masters: [ node1.hostname.com_0 ] Slaves: [ node2.hostname.com_0 ] Clone Set: monitor-eth1-clone [monitor-eth1] Started: [ node1.hostname.com_0 node2.hostname.com_0 ] Clone Set: monitor-eth2-clone [monitor-eth2] Started: [ node1.hostname.com_0 node2.hostname.com_0 ] Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: inactive/disabled This is the "constraints" section of my CIB (full CIB is attached): -- Sam Gardner Trustwave | SMART SECURITY ON DEMAND cib-details.xml Description: cib-details.xml ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org