Re: [ClusterLabs] state file not created for Stateful resource agent

2018-03-20 Thread ashutosh tiwari
ful puts its state file directly in /var/run to
> avoid needing to create any directories. You can change that by setting
> the "state" parameter, but in that case you have to make sure the
> directory you specify exists beforehand.
>
> > This issue is not observed in case secondary do not wait for? cib
> > sync and starts the resource on secondary as well.
> >
> > We are in process of upgrading from centos6 to centos7, We never
> > observed this issue with centos6 releases.
> >
> > Attributes for clone resource:?master-max=1 master-node-max=1 clone-
> > max=2 clone-node-max=1?
> >
> > setup under observation is:
> >
> > CentOS Linux release 7.4.1708 (Core)
> > corosync-2.4.0-9.el7.x86_64
> > pacemaker-1.1.16-12.el7.x86_64.
> >
> >
> > Thanks and Regards,
> > Ashutosh?
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> > pdf
> > Bugs: http://bugs.clusterlabs.org
> --
> Ken Gaillot 
>
>
> --
>
> Message: 2
> Date: Tue, 20 Mar 2018 22:03:18 +0300
> From: George Melikov 
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Subject: [ClusterLabs]  symmetric-cluster=false doesn't work
> Message-ID: <260041521572...@web47j.yandex.ru>
> Content-Type: text/plain; charset="us-ascii"
>
> An HTML attachment was scrubbed...
> URL: <https://lists.clusterlabs.org/pipermail/users/
> attachments/20180320/a4e12349/attachment-0001.html>
>
> --
>
> Message: 3
> Date: Tue, 20 Mar 2018 21:18:29 +0100
> From: Jehan-Guillaume de Rorthais 
> To: Cluster Labs - All topics related to open-source clustering
> welcomed 
> Subject: Re: [ClusterLabs] state file not created for Stateful
> resource agent
> Message-ID: <20180320211829.46cfc4c5@firost>
> Content-Type: text/plain; charset=UTF-8
>
> On Tue, 20 Mar 2018 13:00:49 -0500
> Ken Gaillot  wrote:
>
> > On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> > > Hi,
> > >
> > >
> > > We have two node active/standby cluster with a dummy? Stateful
> > > resource (pacemaker/Stateful).
> > >
> > > We observed that in case one node is up with master resource and
> > > other node is booted up, state file for dummy resource is not created
> > > on the node coming up.
> > >
> > > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> > > e_attributes[@id='status-2']:?  > > name="master-unicloud" value="5"/>
> > > Mar 17 12:22:29 [24875] tigana? ? ? ?lrmd:? ?notice:
> > > operation_finished:? ? ? ? unicloud_start_0:25729:stderr [
> > > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> > > file or directory ]
> >
> > The resource agent is ocf:pw:uc -- I assume this is a local
> > customization of the ocf:pacemaker:Stateful agent?
> >
> > It looks to me like the /var/run/uc directory is not being created on
> > the second node. /var/run is a memory filesystem, so it's wiped at
> > every reboot, and any directories need to be created (as root) before
> > they are used, every boot.
> >
> > ocf:pacemaker:Stateful puts its state file directly in /var/run to
> > avoid needing to create any directories. You can change that by setting
> > the "state" parameter, but in that case you have to make sure the
> > directory you specify exists beforehand.
>
> Another way to create the folder at each boot is to ask systemd.
>
> Eg.:
>
>   cat < /etc/tmpfiles.d/ocf-pw-uc.conf
>   # Directory for ocf:pw:uc resource agent
>   d /var/run/uc 0700 root root - -
>   EOF
>
> Adjust the rights and owner to suit your need.
>
> To take this file in consideration immediately without rebooting the
> server,
> run the following command:
>
>   systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf
>
> Regards,
> --
> Jehan-Guillaume de Rorthais
> Dalibo
>
>
> --
>
> Message: 4
> Date: Tue, 20 Mar 2018 05:59:12 +
> From: Roshni Chatterjee 
> To: "users@clusterlabs.org" 
> Subject: [ClusterLabs] Error observed while starting cluster
> Message-ID:
>  jpnprd01.prod.outlook.com>
>
> Content

[ClusterLabs] Error observed while starting cluster

2018-03-20 Thread Roshni Chatterjee
Hi ,

Error observed in pacemaker and pcs status
Error: cluster is not currently running on this node

!!

I have built the source  code of corosync (2.4.2) and pacemaker (1.1.16)  and  
have followed the below steps for building a 2 node cluster .


1.   Download source  code of corosync and pacemaker (versions as mentioned 
above ) and compile .

2.   Install pcsd using “yum install pcs”

3.   Allow cluster services through firewall using #firewall-cmd 
--permanent --add-service=high-availability

4.   Start and enable pcsd #systemctl start pcsd and #systemctl enable pcsd

5.   Change password for user hacluster

6.   pcs cluster auth pcmk3 node2

7.   pcs cluster setup --name mycluster pcmk3 node2

8.   pcs cluster start -all

9.   pcs status


It is observed that the no error is received till step 8 . At step 9 when pcs 
status is checked error is received (highlighted below)
[root@node2 ~]# pacemakerd --features
Pacemaker 1.1.16 (Build: 94ff4df51a)
Supporting v3.0.11:  agent-manpages libqb-logging libqb-ipc nagios  
corosync-native atomic-attrd acls
[root@node2 ~]# pcs cluster start --all
pcmk3: Starting Cluster...
node2: Starting Cluster...
[root@node2 ~]# pcs status
Error: cluster is not currently running on this node

On checking pacemaker status the following issue is found -
[root@pcmk3 ~]# systemctl pacemaker status -l
Unknown operation 'pacemaker'.
[root@pcmk3 ~]# systemctl status pacemaker -l
● pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor 
preset: disabled)
   Active: active (running) since Tue 2018-03-20 10:55:44 IST; 13min ago
 Docs: man:pacemakerd
   
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
Main PID: 26932 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
   ├─26932 /usr/sbin/pacemakerd -f
   ├─26933 /usr/libexec/pacemaker/cib
   ├─26934 /usr/libexec/pacemaker/stonithd
   ├─26935 /usr/libexec/pacemaker/lrmd
   ├─26936 /usr/libexec/pacemaker/attrd
   └─26937 /usr/libexec/pacemaker/pengine

Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child 
process: crmd
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27035) 
exited: Key has expired (127)
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child 
process: crmd
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27036) 
exited: Key has expired (127)
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child 
process: crmd
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27037) 
exited: Key has expired (127)
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:   notice: Respawning failed child 
process: crmd
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: The crmd process (27038) 
exited: Key has expired (127)
Mar 20 10:55:45 pcmk3 pacemakerd[26932]:error: Child respawn count exceeded 
by crmd
Mar 20 10:56:21 pcmk3 cib[26933]:error: Operation ignored, cluster 
configuration is invalid. Please repair and restart: Update does not conform to 
the configured schema
[root@pcmk3 ~]#

Corosync.log
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked 
child 27035 for process crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 20 10:55:45 [26932] pcmk3 pacemakerd:error: pcmk_child_exit:The 
crmd process (27035) exited: Key has expired (127)
Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice: pcmk_process_exit:  
Respawning failed child process: crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Using 
uid=189 and group=189 for process crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked 
child 27036 for process crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: mcp_cpg_deliver:
Ignoring process list sent by peer for local node
Mar 20 10:55:45 [26932] pcmk3 pacemakerd:error: pcmk_child_exit:The 
crmd process (27036) exited: Key has expired (127)
Mar 20 10:55:45 [26932] pcmk3 pacemakerd:   notice: pcmk_process_exit:  
Respawning failed child process: crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Using 
uid=189 and group=189 for process crmd
Mar 20 10:55:45 [26932] pcmk3 pacemakerd: info: start_child:Forked 
child 27037 for proc

Re: [ClusterLabs] state file not created for Stateful resource agent

2018-03-20 Thread Jehan-Guillaume de Rorthais
On Tue, 20 Mar 2018 13:00:49 -0500
Ken Gaillot  wrote:

> On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> > Hi,
> > 
> > 
> > We have two node active/standby cluster with a dummy  Stateful
> > resource (pacemaker/Stateful).
> > 
> > We observed that in case one node is up with master resource and
> > other node is booted up, state file for dummy resource is not created
> > on the node coming up.
> > 
> > /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> > e_attributes[@id='status-2']:   > name="master-unicloud" value="5"/>
> > Mar 17 12:22:29 [24875] tigana       lrmd:   notice:
> > operation_finished:        unicloud_start_0:25729:stderr [
> > /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> > file or directory ]  
> 
> The resource agent is ocf:pw:uc -- I assume this is a local
> customization of the ocf:pacemaker:Stateful agent?
> 
> It looks to me like the /var/run/uc directory is not being created on
> the second node. /var/run is a memory filesystem, so it's wiped at
> every reboot, and any directories need to be created (as root) before
> they are used, every boot.
> 
> ocf:pacemaker:Stateful puts its state file directly in /var/run to
> avoid needing to create any directories. You can change that by setting
> the "state" parameter, but in that case you have to make sure the
> directory you specify exists beforehand.

Another way to create the folder at each boot is to ask systemd.

Eg.:

  cat < /etc/tmpfiles.d/ocf-pw-uc.conf
  # Directory for ocf:pw:uc resource agent
  d /var/run/uc 0700 root root - -
  EOF

Adjust the rights and owner to suit your need.

To take this file in consideration immediately without rebooting the server,
run the following command:

  systemd-tmpfiles --create /etc/tmpfiles.d/ocf-pw-uc.conf

Regards,
-- 
Jehan-Guillaume de Rorthais
Dalibo
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] symmetric-cluster=false doesn't work

2018-03-20 Thread George Melikov
Hello, I tried to create an asymmetric cluster via property symmetric-cluster=false , but my resources try to start on any node, though I have set locations for them. What did I miss? cib: https://pastebin.com/AhYqgUdw Thank you for any help!Sincerely,George Melikov
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] state file not created for Stateful resource agent

2018-03-20 Thread Ken Gaillot
On Sat, 2018-03-17 at 15:35 +0530, ashutosh tiwari wrote:
> Hi,
> 
> 
> We have two node active/standby cluster with a dummy  Stateful
> resource (pacemaker/Stateful).
> 
> We observed that in case one node is up with master resource and
> other node is booted up, state file for dummy resource is not created
> on the node coming up.
> 
> /cib/status/node_state[@id='2']/transient_attributes[@id='2']/instanc
> e_attributes[@id='status-2']:   name="master-unicloud" value="5"/>
> Mar 17 12:22:29 [24875] tigana       lrmd:   notice:
> operation_finished:        unicloud_start_0:25729:stderr [
> /usr/lib/ocf/resource.d/pw/uc: line 94: /var/run/uc/role: No such
> file or directory ]

The resource agent is ocf:pw:uc -- I assume this is a local
customization of the ocf:pacemaker:Stateful agent?

It looks to me like the /var/run/uc directory is not being created on
the second node. /var/run is a memory filesystem, so it's wiped at
every reboot, and any directories need to be created (as root) before
they are used, every boot.

ocf:pacemaker:Stateful puts its state file directly in /var/run to
avoid needing to create any directories. You can change that by setting
the "state" parameter, but in that case you have to make sure the
directory you specify exists beforehand.

> This issue is not observed in case secondary do not wait for  cib
> sync and starts the resource on secondary as well.
> 
> We are in process of upgrading from centos6 to centos7, We never
> observed this issue with centos6 releases.
> 
> Attributes for clone resource: master-max=1 master-node-max=1 clone-
> max=2 clone-node-max=1 
> 
> setup under observation is:
> 
> CentOS Linux release 7.4.1708 (Core)
> corosync-2.4.0-9.el7.x86_64
> pacemaker-1.1.16-12.el7.x86_64.
> 
> 
> Thanks and Regards,
> Ashutosh 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.
> pdf
> Bugs: http://bugs.clusterlabs.org
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Colocation constraint for grouping all master-mode stateful resources with important stateless resources

2018-03-20 Thread Sam Gardner
Hi All -

I've implemented a simple two-node cluster with DRBD and a couple of 
network-based Master/Slave resources.

Using the ethmonitor RA, I set up failover whenever the Master/Primary node 
loses link on the specified ethernet physical device by constraining the Master 
role only on nodes where the ethmon variable is "1".

Something is going wrong with my colocation constraint, however - if I set up 
the DRBDFS resource to monitor link on eth1, unplugging eth1 on the Primary 
node causes a failover as expected - all Master resources are demoted to 
"slave" and promoted on the opposite node, and the "normal" DRBDFS moves to the 
other node as expected.

However, if I put the same ethmonitor constraint on the network-based 
Master/Slave resource, only that specific resource fails over - DRBDFS stays in 
the same location (though it stops) as do the other Master/Slave resources.

This *smells* like a constraints issue to me - does anyone know what I might be 
doing wrong?

PCS before:
Cluster name: node1.hostname.com_node2.hostname.com
Stack: corosync
Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df) - 
partition with quorum
Last updated: Tue Mar 20 16:25:47 2018
Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on 
node2.hostname.com_0

2 nodes configured
11 resources configured

Online: [ node1.hostname.com_0 node2.hostname.com_0 ]

Full list of resources:

 Master/Slave Set: drbd.master [drbd.slave]
 Masters: [ node1.hostname.com_0 ]
 Slaves: [ node2.hostname.com_0 ]
 drbdfs (ocf::heartbeat:Filesystem):Started node1.hostname.com_0
 Master/Slave Set: inside-interface-sameip.master 
[inside-interface-sameip.slave]
 Masters: [ node1.hostname.com_0 ]
 Slaves: [ node2.hostname.com_0 ]
 Master/Slave Set: outside-interface-sameip.master 
[outside-interface-sameip.slave]
 Masters: [ node1.hostname.com_0 ]
 Slaves: [ node2.hostname.com_0 ]
 Clone Set: monitor-eth1-clone [monitor-eth1]
 Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
 Clone Set: monitor-eth2-clone [monitor-eth2]
 Started: [ node1.hostname.com_0 node2.hostname.com_0 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: inactive/disabled

PCS after:
Cluster name: node1.hostname.com_node2.hostname.com
Stack: corosync
Current DC: node2.hostname.com_0 (version 1.1.16-12.el7_4.4-94ff4df) - 
partition with quorum
Last updated: Tue Mar 20 16:29:40 2018
Last change: Tue Mar 20 16:00:33 2018 by hacluster via crmd on 
node2.hostname.com_0

2 nodes configured
11 resources configured

Online: [ node1.hostname.com_0 node2.hostname.com_0 ]

Full list of resources:

 Master/Slave Set: drbd.master [drbd.slave]
 Masters: [ node1.hostname.com_0 ]
 Slaves: [ node2.hostname.com_0 ]
 drbdfs (ocf::heartbeat:Filesystem):Stopped
 Master/Slave Set: inside-interface-sameip.master 
[inside-interface-sameip.slave]
 Masters: [ node2.hostname.com_0 ]
 Stopped: [ node1.hostname.com_0 ]
 Master/Slave Set: outside-interface-sameip.master 
[outside-interface-sameip.slave]
 Masters: [ node1.hostname.com_0 ]
 Slaves: [ node2.hostname.com_0 ]
 Clone Set: monitor-eth1-clone [monitor-eth1]
 Started: [ node1.hostname.com_0 node2.hostname.com_0 ]
 Clone Set: monitor-eth2-clone [monitor-eth2]
 Started: [ node1.hostname.com_0 node2.hostname.com_0 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: inactive/disabled

This is the "constraints" section of my CIB (full CIB is attached):
  

  


  
  
  

  
  

  
  
  


  

  
  

  

  
  

  

  

--
Sam Gardner
Trustwave | SMART SECURITY ON DEMAND


cib-details.xml
Description: cib-details.xml
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org