Should be fixed now. Thanks for the report! > On 12 Aug 2015, at 1:20 pm, [email protected] wrote: > > Hi All, > > We confirmed movement of > pacemaker_remote.(version:pacemaker-ad1f397a8228a63949f86c96597da5cecc3ed977) > > It is the following cluster constitution. > * bl460g8n3(KVM host) > * bl460g8n4(KVM host) > * pgsr01(Guest on the bl460g8n3 host) > * pgsr02(Guest on the bl460g8n4 host) > > > Step 1) I compose a cluster of a simple resource. > > [root@bl460g8n3 ~]# crm_mon -1 -Af > Last updated: Wed Aug 12 11:52:27 2015 Last change: Wed Aug 12 > 11:51:47 2015 by root via crm_resource on bl460g8n4 > Stack: corosync > Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum > 4 nodes and 10 resources configured > > Online: [ bl460g8n3 bl460g8n4 ] > GuestOnline: [ pgsr01@bl460g8n3 pgsr02@bl460g8n4 ] > > prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3 > prmDB2 (ocf::heartbeat:VirtualDomain): Started bl460g8n4 > Resource Group: grpStonith1 > prmStonith1-2 (stonith:external/ipmi): Started bl460g8n4 > Resource Group: grpStonith2 > prmStonith2-2 (stonith:external/ipmi): Started bl460g8n3 > Resource Group: master-group > vip-master (ocf::heartbeat:Dummy): Started pgsr02 > vip-rep (ocf::heartbeat:Dummy): Started pgsr02 > Master/Slave Set: msPostgresql [pgsql] > Masters: [ pgsr02 ] > Slaves: [ pgsr01 ] > > Node Attributes: > * Node bl460g8n3: > * Node bl460g8n4: > * Node pgsr01@bl460g8n3: > + master-pgsql : 5 > * Node pgsr02@bl460g8n4: > + master-pgsql : 10 > > Migration Summary: > * Node bl460g8n4: > * Node bl460g8n3: > * Node pgsr02@bl460g8n4: > * Node pgsr01@bl460g8n3: > > > Step 2) I cause trouble of pacemaker_remote in pgsr02. > > [root@pgsr02 ~]# ps -ef |grep remote > root 1171 1 0 11:52 ? 00:00:00 /usr/sbin/pacemaker_remoted > root 1428 1377 0 11:53 pts/0 00:00:00 grep --color=auto remote > [root@pgsr02 ~]# kill -9 1171 > > > Step 3) After trouble, the master-group resource does not start in pgsr01. > > [root@bl460g8n3 ~]# crm_mon -1 -Af > Last updated: Wed Aug 12 11:54:04 2015 Last change: Wed Aug 12 > 11:51:47 2015 by root via crm_resource on bl460g8n4 > Stack: corosync > Current DC: bl460g8n3 (version 1.1.13-ad1f397) - partition with quorum > 4 nodes and 10 resources configured > > Online: [ bl460g8n3 bl460g8n4 ] > GuestOnline: [ pgsr01@bl460g8n3 ] > > prmDB1 (ocf::heartbeat:VirtualDomain): Started bl460g8n3 > prmDB2 (ocf::heartbeat:VirtualDomain): FAILED bl460g8n4 > Resource Group: grpStonith1 > prmStonith1-2 (stonith:external/ipmi): Started bl460g8n4 > Resource Group: grpStonith2 > prmStonith2-2 (stonith:external/ipmi): Started bl460g8n3 > Master/Slave Set: msPostgresql [pgsql] > Masters: [ pgsr01 ] > > Node Attributes: > * Node bl460g8n3: > * Node bl460g8n4: > * Node pgsr01@bl460g8n3: > + master-pgsql : 10 > > Migration Summary: > * Node bl460g8n4: > pgsr02: migration-threshold=1 fail-count=1 last-failure='Wed Aug 12 > 11:53:39 2015' > * Node bl460g8n3: > * Node pgsr01@bl460g8n3: > > Failed Actions: > * pgsr02_monitor_30000 on bl460g8n4 'unknown error' (1): call=2, > status=Error, exitreason='none', > last-rc-change='Wed Aug 12 11:53:39 2015', queued=0ms, exec=0ms > > > It seems to be caused by the fact that STONITH is not carried out somehow or > other. > The demote operation that a cluster cannot handle seems to obstruct start in > pgsr01. > -------------------------------------------------------------------------------------- > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: Graph 10 with 20 actions: > batch-limit=20 jobs, network-delay=0ms > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 4]: Pending rsc op > prmDB2_stop_0 on bl460g8n4 (priority: 0, waiting: 70) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 36]: Completed pseudo > op master-group_stop_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 34]: Completed pseudo > op master-group_start_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 82]: Completed rsc op > pgsql_post_notify_demote_0 on pgsr01 (priority: 1000000, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 81]: Completed rsc op > pgsql_pre_notify_demote_0 on pgsr01 (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 78]: Completed rsc op > pgsql_post_notify_stop_0 on pgsr01 (priority: 1000000, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 77]: Completed rsc op > pgsql_pre_notify_stop_0 on pgsr01 (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 67]: Completed pseudo > op msPostgresql_confirmed-post_notify_demoted_0 on N/A (priority: 1000000, > waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 66]: Completed pseudo > op msPostgresql_post_notify_demoted_0 on N/A (priority: 1000000, waiting: > none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 65]: Completed pseudo > op msPostgresql_confirmed-pre_notify_demote_0 on N/A (priority: 0, waiting: > none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 64]: Completed pseudo > op msPostgresql_pre_notify_demote_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 63]: Completed pseudo > op msPostgresql_demoted_0 on N/A (priority: 1000000, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 62]: Completed pseudo > op msPostgresql_demote_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 55]: Completed pseudo > op msPostgresql_confirmed-post_notify_stopped_0 on N/A (priority: 1000000, > waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 54]: Completed pseudo > op msPostgresql_post_notify_stopped_0 on N/A (priority: 1000000, waiting: > none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 53]: Completed pseudo > op msPostgresql_confirmed-pre_notify_stop_0 on N/A (priority: 0, waiting: > none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 52]: Completed pseudo > op msPostgresql_pre_notify_stop_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 51]: Completed pseudo > op msPostgresql_stopped_0 on N/A (priority: 1000000, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 50]: Completed pseudo > op msPostgresql_stop_0 on N/A (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: [Action 70]: Pending rsc op > pgsr02_stop_0 on bl460g8n4 (priority: 0, waiting: none) > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: * [Input 38]: Unresolved > dependency rsc op pgsql_demote_0 on pgsr02 > Aug 12 12:08:40 bl460g8n3 crmd[9427]: info: FSA: Input I_TE_SUCCESS from > notify_crmd() received in state S_TRANSITION_ENGINE > Aug 12 12:08:40 bl460g8n3 crmd[9427]: notice: State transition > S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL > origin=notify_crmd ] > -------------------------------------------------------------------------------------- > > Is there setting to let a cluster carry out STONITH well? > Is this a bug of pacemaker_remote? > > * I registered these contents with > Bugzilla.(http://bugs.clusterlabs.org/show_bug.cgi?id=5247) > * In addition, I attached crm_report to Bugzilla. > > Best Regards, > Hideo Yamauchi. > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
