[ClusterLabs] Clone Issue

Frank D. Engel, Jr. Sat, 13 Feb 2016 18:12:58 -0800

Hi,

I'm new to the software, and with the list - just started experimentingwith trying to get a cluster working using CentOS 7 and the pcs utility,and I've made some progress, but I can't quite figure out why I'm seeingthis one behavior - hoping someone can help, might be something simple Ihaven't picked up on yet.

I have three nodes configured (running under VirtualBox) with sharedstorage using GFS2 - that much seems to be working ok.

I have a service called "WebSite" representing the Apache configuration,and I cloned that to create "WebSite-clone", which I would expect to runinstances of on all three nodes.

However, if I leave "globally-unique" off, it will only run on one node,where if I turn it on, it will run on two, but never on all three. I'vetried a number of things to get this working. I did verify that I canmanually start and stop Apache on all three nodes and it works on any ofthem that way.

Currently my status looks like this (with globally-unique set to false;"cluster-data" is my GFS2 filesystem):


Cluster name: lincl

Last updated: Sat Feb 13 20:58:26 2016 Last change: Sat Feb 1320:45:08 2016 by root via crm_resource on lincl2-hb

Stack: corosync

Current DC: lincl2-hb (version 1.1.13-10.el7-44eb2dd) - partition withquorum

3 nodes and 13 resources configured

Online: [ lincl0-hb lincl1-hb lincl2-hb ]

Full list of resources:

 kdump    (stonith:fence_kdump):    Started lincl0-hb
 Clone Set: dlm-clone [dlm]
     Started: [ lincl0-hb lincl1-hb lincl2-hb ]
 Master/Slave Set: cluster-data-clone [cluster-data]
     Slaves: [ lincl0-hb lincl1-hb lincl2-hb ]
 Clone Set: ClusterIP-clone [ClusterIP] (unique)
     ClusterIP:0    (ocf::heartbeat:IPaddr2):    Started lincl2-hb
     ClusterIP:1    (ocf::heartbeat:IPaddr2):    Started lincl0-hb
     ClusterIP:2    (ocf::heartbeat:IPaddr2):    Started lincl1-hb
 Clone Set: WebSite-clone [WebSite]
     Started: [ lincl0-hb ]
     Stopped: [ lincl1-hb lincl2-hb ]

Failed Actions:

* WebSite:0_start_0 on lincl2-hb 'unknown error' (1): call=142,status=Timed Out, exitreason='Failed to access httpd status page.',

    last-rc-change='Sat Feb 13 19:55:45 2016', queued=0ms, exec=120004ms

* WebSite:2_start_0 on lincl2-hb 'unknown error' (1): call=130,status=Timed Out, exitreason='none',

    last-rc-change='Sat Feb 13 19:33:49 2016', queued=0ms, exec=40003ms

* WebSite:1_monitor_60000 on lincl0-hb 'unknown error' (1): call=101,status=complete, exitreason='Failed to access httpd status page.',

    last-rc-change='Sat Feb 13 19:53:53 2016', queued=0ms, exec=0ms

* WebSite:0_monitor_60000 on lincl0-hb 'not running' (7): call=77,status=complete, exitreason='none',

    last-rc-change='Sat Feb 13 19:34:48 2016', queued=0ms, exec=0ms

* WebSite:2_start_0 on lincl1-hb 'unknown error' (1): call=41,status=Timed Out, exitreason='none',

    last-rc-change='Sat Feb 13 19:53:41 2016', queued=1ms, exec=120004ms


PCSD Status:
  lincl0-hb: Online
  lincl1-hb: Online
  lincl2-hb: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

I'm not sure how to further troubleshoot those "Failed Actions" or howto clear them from the display?



Configuration of the WebSite-clone looks like:

[root@lincl2 /]# pcs resource show WebSite-clone
 Clone: WebSite-clone

Meta Attrs: globally-unique=false clone-node-max=1 clone-max=3interleave=true

  Resource: WebSite (class=ocf provider=heartbeat type=apache)

Attributes: configfile=/etc/httpd/conf/httpd.confstatusurl=http://localhost/server-status

   Operations: start interval=0s timeout=120s (WebSite-start-interval-0s)
               stop interval=0s timeout=60s (WebSite-stop-interval-0s)
               monitor interval=1min (WebSite-monitor-interval-1min)



Now I change globally-unique to true, and this happens:

[root@lincl2 /]# pcs resource update WebSite-clone globally-unique=true
[root@lincl2 /]# pcs resource
 Clone Set: dlm-clone [dlm]
     Started: [ lincl0-hb lincl1-hb lincl2-hb ]
 Master/Slave Set: cluster-data-clone [cluster-data]
     Slaves: [ lincl0-hb lincl1-hb lincl2-hb ]
 Clone Set: ClusterIP-clone [ClusterIP] (unique)
     ClusterIP:0    (ocf::heartbeat:IPaddr2):    Started lincl2-hb
     ClusterIP:1    (ocf::heartbeat:IPaddr2):    Started lincl0-hb
     ClusterIP:2    (ocf::heartbeat:IPaddr2):    Started lincl1-hb
 Clone Set: WebSite-clone [WebSite] (unique)
     WebSite:0    (ocf::heartbeat:apache):    Started lincl0-hb
     WebSite:1    (ocf::heartbeat:apache):    Started lincl1-hb
     WebSite:2    (ocf::heartbeat:apache):    Stopped


Constraints are set up as follows:

[root@lincl2 /]# pcs constraint
Location Constraints:
Ordering Constraints:
  start dlm-clone then start cluster-data-clone (kind:Mandatory)
  start ClusterIP-clone then start WebSite-clone (kind:Mandatory)
  start cluster-data-clone then start WebSite-clone (kind:Mandatory)
Colocation Constraints:
  cluster-data-clone with dlm-clone (score:INFINITY)
  WebSite-clone with ClusterIP-clone (score:INFINITY)
  WebSite-clone with cluster-data-clone (score:INFINITY)

As far as I can tell, there is no activity in the Apache log files frompcs trying to start it and it failing or taking too long - it seems thatit never gets far enough for Apache itself to be trying to start.

Can someone give me ideas on how to further troubleshoot this? IdeallyI'd like it running one instance on each available node.





_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Clone Issue

Reply via email to