[ClusterLabs] [Question:pacemaker_remote] By the operation that remote node cannot carry out a cluster, the resource does not move. (STONITH is not carried out, too)

renayama19661014 Tue, 11 Aug 2015 20:22:53 -0700

Hi All,

We confirmed movement of 
pacemaker_remote.(version:pacemaker-ad1f397a8228a63949f86c96597da5cecc3ed977)


It is the following cluster constitution.
 * bl460g8n3(KVM host)
 * bl460g8n4(KVM host)
 * pgsr01(Guest on the bl460g8n3 host)
 * pgsr02(Guest on the bl460g8n4 host)


Step 1) I compose a cluster of a simple resource.

[root@bl460g8n3 ~]# crm_mon -1 -Af
Last updated: Wed Aug 12 11:52:27 2015          Last change: Wed Aug 12 
11:51:47 2015 by root via crm_resource on bl460g8n4
Stack: corosync
Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
4 nodes and 10 resources configured

Online: [ bl460g8n3 bl460g8n4 ]
GuestOnline: [ pgsr01@bl460g8n3 pgsr02@bl460g8n4 ]

 prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
 prmDB2 (ocf::heartbeat:VirtualDomain): Started bl460g8n4
 Resource Group: grpStonith1
     prmStonith1-2      (stonith:external/ipmi):        Started bl460g8n4
 Resource Group: grpStonith2
     prmStonith2-2      (stonith:external/ipmi):        Started bl460g8n3
 Resource Group: master-group
     vip-master (ocf::heartbeat:Dummy): Started pgsr02
     vip-rep    (ocf::heartbeat:Dummy): Started pgsr02
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ pgsr02 ]
     Slaves: [ pgsr01 ]

Node Attributes:
* Node bl460g8n3:
* Node bl460g8n4:
* Node pgsr01@bl460g8n3:
    + master-pgsql                      : 5         
* Node pgsr02@bl460g8n4:
    + master-pgsql                      : 10        

Migration Summary:
* Node bl460g8n4:
* Node bl460g8n3:
* Node pgsr02@bl460g8n4:
* Node pgsr01@bl460g8n3:


Step 2) I cause trouble of pacemaker_remote in pgsr02.

[root@pgsr02 ~]# ps -ef |grep remote
root      1171     1  0 11:52 ?        00:00:00 /usr/sbin/pacemaker_remoted
root      1428  1377  0 11:53 pts/0    00:00:00 grep --color=auto remote
[root@pgsr02 ~]# kill -9 1171


Step 3) After trouble, the master-group resource does not start in pgsr01.

[root@bl460g8n3 ~]# crm_mon -1 -Af
Last updated: Wed Aug 12 11:54:04 2015          Last change: Wed Aug 12 
11:51:47 2015 by root via crm_resource on bl460g8n4
Stack: corosync
Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum
4 nodes and 10 resources configured

Online: [ bl460g8n3 bl460g8n4 ]
GuestOnline: [ pgsr01@bl460g8n3 ]

 prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3
 prmDB2 (ocf::heartbeat:VirtualDomain): FAILED bl460g8n4
 Resource Group: grpStonith1
     prmStonith1-2      (stonith:external/ipmi):        Started bl460g8n4
 Resource Group: grpStonith2
     prmStonith2-2      (stonith:external/ipmi):        Started bl460g8n3
 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ pgsr01 ]

Node Attributes:
* Node bl460g8n3:
* Node bl460g8n4:
* Node pgsr01@bl460g8n3:
    + master-pgsql                      : 10        

Migration Summary:
* Node bl460g8n4:
   pgsr02: migration-threshold=1 fail-count=1 last-failure='Wed Aug 12 11:53:39 
2015'
* Node bl460g8n3:
* Node pgsr01@bl460g8n3:

Failed Actions:
* pgsr02_monitor_30000 on bl460g8n4 'unknown error' (1): call=2, status=Error, 
exitreason='none',
    last-rc-change='Wed Aug 12 11:53:39 2015', queued=0ms, exec=0ms


It seems to be caused by the fact that STONITH is not carried out somehow or 
other.
The demote operation that a cluster cannot handle seems to obstruct start in 
pgsr01.
--------------------------------------------------------------------------------------
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: Graph 10 with 20 actions: 
batch-limit=20 jobs, network-delay=0ms
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action    4]: Pending rsc op 
prmDB2_stop_0                       on bl460g8n4 (priority: 0, waiting:  70)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   36]: Completed pseudo 
op master-group_stop_0            on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   34]: Completed pseudo 
op master-group_start_0           on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   82]: Completed rsc op 
pgsql_post_notify_demote_0        on pgsr01 (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   81]: Completed rsc op 
pgsql_pre_notify_demote_0         on pgsr01 (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   78]: Completed rsc op 
pgsql_post_notify_stop_0          on pgsr01 (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   77]: Completed rsc op 
pgsql_pre_notify_stop_0           on pgsr01 (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   67]: Completed pseudo 
op msPostgresql_confirmed-post_notify_demoted_0 on N/A (priority: 1000000, 
waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   66]: Completed pseudo 
op msPostgresql_post_notify_demoted_0 on N/A (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   65]: Completed pseudo 
op msPostgresql_confirmed-pre_notify_demote_0 on N/A (priority: 0, waiting: 
none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   64]: Completed pseudo 
op msPostgresql_pre_notify_demote_0 on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   63]: Completed pseudo 
op msPostgresql_demoted_0         on N/A (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   62]: Completed pseudo 
op msPostgresql_demote_0          on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   55]: Completed pseudo 
op msPostgresql_confirmed-post_notify_stopped_0 on N/A (priority: 1000000, 
waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   54]: Completed pseudo 
op msPostgresql_post_notify_stopped_0 on N/A (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   53]: Completed pseudo 
op msPostgresql_confirmed-pre_notify_stop_0 on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   52]: Completed pseudo 
op msPostgresql_pre_notify_stop_0 on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   51]: Completed pseudo 
op msPostgresql_stopped_0         on N/A (priority: 1000000, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   50]: Completed pseudo 
op msPostgresql_stop_0            on N/A (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action   70]: Pending rsc op 
pgsr02_stop_0                       on bl460g8n4 (priority: 0, waiting: none)
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice:  * [Input 38]: Unresolved 
dependency rsc op pgsql_demote_0 on pgsr02
Aug 12 12:08:40 bl460g8n3 crmd[9427]: info: FSA: Input I_TE_SUCCESS from 
notify_crmd() received in state S_TRANSITION_ENGINE
Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
origin=notify_crmd ]
--------------------------------------------------------------------------------------

Is there setting to let a cluster carry out STONITH well?
Is this a bug of pacemaker_remote?

 * I registered these contents with 
Bugzilla.(http://bugs.clusterlabs.org/show_bug.cgi?id=5247)
 * In addition, I attached crm_report to Bugzilla.

Best Regards,
Hideo Yamauchi.

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Question:pacemaker_remote] By the operation that remote node cannot carry out a cluster, the resource does not move. (STONITH is not carried out, too)

Reply via email to