[ClusterLabs] Antwort: Antw: Antwort: Antw: Antw: Antwort: Antw: corosync/dlm fencing?

Philipp Achmüller Wed, 18 Jul 2018 05:34:02 -0700

"Users" <[email protected]> schrieb am 16.07.2018 15:12:14:


> Von: "Ulrich Windl" <[email protected]>
> An: <[email protected]>
> Datum: 16.07.2018 15:12
> Betreff: [ClusterLabs] Antw: Antwort: Antw: Antw: Antwort: Antw: 
> corosync/dlm fencing?
> Gesendet von: "Users" <[email protected]>
> 
> >>> Philipp Achmüller <[email protected]> schrieb am 16.07.2018 
um
> 14:09 in
> Nachricht 
<[email protected]>:
> > Hi,
> > 
> >> Von: "Ulrich Windl" <[email protected]>
> >> An: <[email protected]>
> >> Datum: 16.07.2018 13:46
> >> Betreff: [ClusterLabs] Antw: Antw: Antwort: Antw: corosync/dlm 
fencing?
> >> Gesendet von: "Users" <[email protected]>
> >> 
> >> Hi again!
> >> 
> >> Oh, I missed "...maintenance i would like to standby 1 or 2 nodes 
from
> >> "sitea""...
> >> 
> > 
> > Yes - sorry for tons of logs - but i think this will catch the whole 
> > situation... 
> > 
> >> I think some time ago I had asked about the same thing for SLES11, 
and 
> > the
> >> answer was that with some configurations a standby is not possible (I 

> > thought
> >> it was related to OCFS2, but mybe it was cLVM or DLM). Despite of 
quorum
> >> issues, why not shutdown the node completely?
> > 
> > i also tried "systemctl stop pacemaker" or "normal" system shutdown -> 

> > same outcome, there is always dlm fencing some other nodes...
> 
> Hmm... I looked up my last node shutdown on the three-node cluster. 
Essential
> messages are like this (many messages left out, sometimes indicated by
> "(more)"):
> 
> attrd[13169]:   notice: attrd_trigger_update: Sending flush op to all 
hosts
> for: shutdown (0)
> pengine[13170]:   notice: stage6: Scheduling Node h10 for shutdown
> pengine[13170]:   notice: LogActions: Stop    prm_DLM:1        (h10)
> pengine[13170]:   notice: LogActions: Stop    prm_cLVMd:1      (h10)
> pengine[13170]:   notice: LogActions: Stop    prm_LVM_CFS_VMs:1 (h10)
> (more)
> crmd[13171]:   notice: te_rsc_command: Initiating action 97: stop
> prm_CFS_VMs_fs_stop_0 on h10
> cluster-dlm[13901]: log_config: dlm:ls:490B9FCAFA3D4B2F9A586A5893E00730 
conf 2
> 0 1 memb 739512321 739512325 join left 739512330
> cluster-dlm[13901]: add_change: 490B9FCAFA3D4B2F9A586A5893E00730 
add_change cg
> 8 remove nodeid 739512330 reason 2
> (more)
> cluster-dlm[13901]: receive_plocks_stored: 
490B9FCAFA3D4B2F9A586A5893E00730
> receive_plocks_stored 739512321:8 flags a sig 0 need_plocks 0
> ocfs2_controld[14389]: confchg called
> ocfs2_controld[14389]: node daemon left 739512330
> cluster-dlm[13901]: log_config: dlm:ls:clvmd conf 2 0 1 memb 739512321
> 739512325 join left 739512330
> cluster-dlm[13901]: stop_kernel: clvmd stop_kernel cg 8
> cluster-dlm[13901]: log_config: dlm:controld conf 2 0 1 memb 739512321
> 739512325 join left 739512330
>  crmd[13171]:   notice: run_graph: Transition 10 (Complete=41, 
Pending=0,
> Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete
> cib[13166]:   notice: crm_update_peer_state: cib_peer_update_callback: 
Node
> h10[739512330] - state is now lost (was member)
> corosync[13162]:  [CLM   ] Members Left:
> corosync[13162]:  [CLM   ]     r(0) ip(172.20.16.10) r(1) ip(10.2.2.10)
> stonith-ng[13167]:   notice: crm_update_peer_state: 
st_peer_update_callback:
> Node h10[739512330] - state is now lost (was member)
> cluster-dlm[13901]: dlm_process_node: Removed inactive node 739512330:
> born-on=3804, last-seen=3804, this-event=3808, last-event=3804
> kernel: [349683.507286] dlm: closing connection to node 739512330
> crmd[13171]:   notice: peer_update_callback: do_shutdown of h10 (op 190) 
is
> complete
> corosync[13162]:  [MAIN  ] Completed service synchronization, ready to 
provide
> service.
> crmd[13171]:   notice: do_state_transition: State transition
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL
> origin=notify_crmd ]
> 
> Another random thought: How do you collect your logs? The node being 
fenced
> may not have the last few log messages written to disk (SLES11 syslog at
> least). Sometimes it's a good idea to "tail -f" the syslogs on each 
> node, so if
> the node goes down, you see what is expected to be in the syslog at 
least.
> Chances may be better if you have a remote syslog server via DGRAM 
sockets...
> 

There are messages in syslog, but notwhing cluster relevated

regards
Philipp

> Regards,
> Ulrich
> 
> > 
> >> 
> >> Regards,
> >> Ulrich
> >> 
> >> >>> "Ulrich Windl" <[email protected]> schrieb am 
> > 16.07.2018
> >> um
> >> 13:35 in Nachricht 
<[email protected]>:
> >> >>>> Philipp Achmüller <[email protected]> schrieb am 
16.07.2018 
> > um
> >> > 11:44 in
> >> > Nachricht
> >> <[email protected]>:
> >> >> hi!
> >> >> 
> >> >> Thank you for comment.
> >> >> Unfortunatly it is not obvious for me - the "grep fence" is 
attached 
> > in my
> >> 
> >> >> original message.
> >> > 
> >> > Hi!
> >> > 
> >> > OK, seems I missed finding the needle in all the hay...
> >> > Anyway I think the problem is "Cluster node siteb-1 will be fenced: 

> > peer is
> >> 
> >> > no
> >> > longer part of the cluster". Looks as if the cluster noticed an 
> > unclean
> >> > shutdown of node siteb-1. The message "Stonith/shutdown of siteb-1 
not 
> > 
> >> > matched"
> >> > seems to confirm that.
> >> > When shutting down two nodes, did you wait until shutdown of the 
first 
> > node
> >> > succeeded before shutting down the second?
> >> > 
> >> > Regards,
> >> > Ulrich
> >> > 
> >> >> 
> >> >> i searched older logs with activated dubug information for dlm: - 
> > this is 
> >> >> the sequence from syslog (from another timeframe):
> >> >> 
> >> >> -------------------
> >> >> Node: siteb-2 (DC):
> >> >> ............
> >> >> 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]:   notice: 
> > State 
> >> >> transition S_IDLE -> S_POLICY_ENGINE
> >> >> 2018-06-28T09:02:23.279028+02:00 siteb-2 pengine[189259]: notice: 
> >> >> Watchdog will be used via SBD if fencing is required
> >> >> 2018-06-28T09:02:23.279214+02:00 siteb-2 pengine[189259]: notice: 
> > On 
> >> >> loss of CCM Quorum: Ignore
> >> >> 2018-06-28T09:02:23.282153+02:00 siteb-2 pengine[189259]: notice: 
> > Move 
> >> >> stonith-sbd#011(Started sitea-1 -> siteb-1)
> >> >> 2018-06-28T09:02:23.282249+02:00 siteb-2 pengine[189259]: notice: 
> > Move 
> >> >> cl-info#011(Started sitea-1 -> siteb-2)
> >> >> 2018-06-28T09:02:23.282338+02:00 siteb-2 pengine[189259]: notice: 
> > Move 
> >> >> k45RG#011(Started sitea-2 -> siteb-1)
> >> >> 2018-06-28T09:02:23.282422+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> dlm:0#011(sitea-1)
> >> >> 2018-06-28T09:02:23.282505+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> clvm:0#011(sitea-1)
> >> >> 2018-06-28T09:02:23.282588+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> vg1:0#011(sitea-1)
> >> >> 2018-06-28T09:02:23.282670+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> dlm:3#011(sitea-2)
> >> >> 2018-06-28T09:02:23.282752+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> clvm:3#011(sitea-2)
> >> >> 2018-06-28T09:02:23.282833+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> vg1:3#011(sitea-2)
> >> >> 2018-06-28T09:02:23.282916+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> sysinfo:0#011(sitea-1)
> >> >> 2018-06-28T09:02:23.283001+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> sysinfo:3#011(sitea-2)
> >> >> 2018-06-28T09:02:23.283978+02:00 siteb-2 pengine[189259]: notice: 
> >> >> Calculated transition 1056, saving inputs in 
> >> >> /var/lib/pacemaker/pengine/pe-input-2321.bz2
> >> >> 2018-06-28T09:02:23.284428+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Processing graph 1056 (ref=pe_calc-dc-1530169343-1339) derived 
from 
> >> >> /var/lib/pacemaker/pengine/pe-input-2321.bz2
> >> >> 2018-06-28T09:02:23.284575+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation stonith-sbd_stop_0 on sitea-1
> >> >> 2018-06-28T09:02:23.284659+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation cl-info_stop_0 on sitea-1
> >> >> 2018-06-28T09:02:23.284742+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation k45RG_stop_0 on sitea-2
> >> >> 2018-06-28T09:02:23.284824+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation vg1_stop_0 on sitea-1
> >> >> 2018-06-28T09:02:23.284908+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation vg1_stop_0 on sitea-2
> >> >> 2018-06-28T09:02:23.284990+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation sysinfo_stop_0 on sitea-1
> >> >> 2018-06-28T09:02:23.285072+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation sysinfo_stop_0 on sitea-2
> >> >> 2018-06-28T09:02:23.288254+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating start operation stonith-sbd_start_0 on siteb-1
> >> >> 2018-06-28T09:02:23.298867+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating start operation cl-info_start_0 locally on siteb-2
> >> >> 2018-06-28T09:02:23.309272+02:00 siteb-2 lrmd[189257]:   notice: 
> > executing
> >> 
> >> >> - rsc:cl-info action:start call_id:105
> >> >> 2018-06-28T09:02:23.384074+02:00 siteb-2 lrmd[189257]:   notice: 
> > finished 
> >> >> - rsc:cl-info action:start call_id:105 pid:253747 exit-code:0 
> >> >> exec-time:75ms queue-time:0ms
> >> >> 2018-06-28T09:02:23.393759+02:00 siteb-2 crmd[189260]:   notice: 
> > Result of
> >> 
> >> >> start operation for cl-info on siteb-2: 0 (ok)
> >> >> 2018-06-28T09:02:23.395594+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating monitor operation cl-info_monitor_60000 locally on 
siteb-2
> >> >> 2018-06-28T09:02:24.159586+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation clvm_stop_0 on sitea-2
> >> >> 2018-06-28T09:02:24.193317+02:00 siteb-2 kernel: [80844.122213] 
dlm: 
> >> >> clvmd: dlm_recover 5
> >> >> 2018-06-28T09:02:24.193349+02:00 siteb-2 kernel: [80844.122240] 
dlm: 
> >> >> clvmd: dlm_clear_toss 1 done
> >> >> 2018-06-28T09:02:24.193351+02:00 siteb-2 kernel: [80844.122251] 
dlm: 
> >> >> clvmd: remove member 3
> >> >> 2018-06-28T09:02:24.193352+02:00 siteb-2 kernel: [80844.122579] 
dlm: 
> >> >> clvmd: dlm_recover_members 3 nodes
> >> >> 2018-06-28T09:02:24.993269+02:00 siteb-2 kernel: [80844.920751] 
dlm: 
> >> >> clvmd: generation 7 slots 3 1:2 2:4 3:1
> >> >> 2018-06-28T09:02:24.993283+02:00 siteb-2 kernel: [80844.920755] 
dlm: 
> >> >> clvmd: dlm_recover_directory
> >> >> 2018-06-28T09:02:25.189265+02:00 siteb-2 kernel: [80845.118173] 
dlm: 
> >> >> clvmd: dlm_recover_directory 1 in 0 new
> >> >> 2018-06-28T09:02:25.604661+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation clvm_stop_0 on sitea-1
> >> >> 2018-06-28T09:02:26.189277+02:00 siteb-2 kernel: [80846.116630] 
dlm: 
> >> >> clvmd: dlm_recover_directory 0 out 2 messages
> >> >> 2018-06-28T09:02:26.189291+02:00 siteb-2 kernel: [80846.116635] 
dlm: 
> >> >> clvmd: dlm_recover_masters
> >> >> 2018-06-28T09:02:26.189292+02:00 siteb-2 kernel: [80846.116648] 
dlm: 
> >> >> clvmd: dlm_recover_masters 0 of 2
> >> >> 2018-06-28T09:02:26.189293+02:00 siteb-2 kernel: [80846.116649] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 out
> >> >> 2018-06-28T09:02:26.208948+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation dlm_stop_0 on sitea-2
> >> >> 2018-06-28T09:02:26.789266+02:00 siteb-2 kernel: [80846.716532] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 in
> >> >> 2018-06-28T09:02:27.013264+02:00 siteb-2 kernel: [80846.940071] 
dlm: 
> >> >> clvmd: dlm_recover_done_wait error -4
> >> >> 2018-06-28T09:02:27.013276+02:00 siteb-2 kernel: [80846.940076] 
dlm: 
> >> >> clvmd: dlm_recover 5 error -4
> >> >> 2018-06-28T09:02:27.093263+02:00 siteb-2 kernel: [80847.022308] 
dlm: 
> >> >> clvmd: dlm_recover 7
> >> >> 2018-06-28T09:02:27.093273+02:00 siteb-2 kernel: [80847.022372] 
dlm: 
> >> >> clvmd: prev removed member 3
> >> >> 2018-06-28T09:02:27.093274+02:00 siteb-2 kernel: [80847.022375] 
dlm: 
> >> >> clvmd: remove member 1
> >> >> 2018-06-28T09:02:27.493266+02:00 siteb-2 kernel: [80847.420198] 
dlm: 
> >> >> clvmd: dlm_recover_members 2 nodes
> >> >> 2018-06-28T09:02:38.414440+02:00 siteb-2 corosync[189245]: [TOTEM 
] 
> > A 
> >> >> processor failed, forming new configuration.
> >> >> 2018-06-28T09:02:52.080141+02:00 siteb-2 corosync[189245]: [TOTEM 
] 
> > A 
> >> >> new membership (192.168.121.55:2012) was formed. Members left: 2
> >> >> 2018-06-28T09:02:52.080537+02:00 siteb-2 corosync[189245]: [TOTEM 
] 
> > 
> >> >> Failed to receive the leave message. failed: 2
> >> >> 2018-06-28T09:02:52.083415+02:00 siteb-2 attrd[189258]:   notice: 
> > Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.083533+02:00 siteb-2 attrd[189258]:   notice: 
> > Removing
> >> 
> >> >> all siteb-1 attributes for peer loss
> >> >> 2018-06-28T09:02:52.083617+02:00 siteb-2 attrd[189258]:   notice: 
> > Purged 1
> >> 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.083700+02:00 siteb-2 cib[189255]:   notice: 
Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.083793+02:00 siteb-2 cib[189255]:   notice: 
> > Purged 1 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.083875+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Node siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.083968+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Purged 1 peer with id=2 and/or uname=siteb-1 from the membership 
> > cache
> >> >> 2018-06-28T09:02:52.084054+02:00 siteb-2 crmd[189260]:  warning: 
No 
> > reason
> >> 
> >> >> to expect node 2 to be down
> >> >> 2018-06-28T09:02:52.084156+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Stonith/shutdown of siteb-1 not matched
> >> >> 2018-06-28T09:02:52.084242+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Transition aborted: Node failure
> >> >> 2018-06-28T09:02:52.084327+02:00 siteb-2 crmd[189260]:   notice: 
> > Action 17
> >> 
> >> >> (stonith-sbd_start_0) was pending on 2 (offline)
> >> >> 2018-06-28T09:02:52.084409+02:00 siteb-2 corosync[189245]: 
[QUORUM] 
> > 
> >> >> Members[3]: 1 3 4
> >> >> 2018-06-28T09:02:52.084492+02:00 siteb-2 corosync[189245]:   [MAIN 
 ] 
> > 
> >> >> Completed service synchronization, ready to provide service.
> >> >> 2018-06-28T09:02:52.084574+02:00 siteb-2 pacemakerd[189252]: 
notice: 
> >> >> Node siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.084667+02:00 siteb-2 crmd[189260]:   notice: 
Node 
> > 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.084749+02:00 siteb-2 crmd[189260]:  warning: 
No 
> > reason
> >> 
> >> >> to expect node 2 to be down
> >> >> 2018-06-28T09:02:52.084831+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Stonith/shutdown of siteb-1 not matched
> >> >> 2018-06-28T09:02:52.085203+02:00 siteb-2 kernel: [80872.011754] 
dlm: 
> >> >> clvmd: dlm_recover_members_wait error -4
> >> >> 2018-06-28T09:02:52.085209+02:00 siteb-2 kernel: [80872.011757] 
dlm: 
> >> >> clvmd: dlm_recover 7 error -4
> >> >> 2018-06-28T09:02:52.085210+02:00 siteb-2 kernel: [80872.012486] 
dlm: 
> >> >> closing connection to node 2
> >> >> 2018-06-28T09:02:52.086174+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Transition 1056 (Complete=18, Pending=0, Fired=0, Skipped=4, 
> > Incomplete=4,
> >> 
> >> >> Source=/var/lib/pacemaker/pengine/pe-input-2321.bz2): Stopped
> >> >> 2018-06-28T09:02:53.095172+02:00 siteb-2 pengine[189259]: notice: 
> >> >> Watchdog will be used via SBD if fencing is required
> >> >> 2018-06-28T09:02:53.095348+02:00 siteb-2 pengine[189259]: notice: 
> > On 
> >> >> loss of CCM Quorum: Ignore
> >> >> 2018-06-28T09:02:53.095584+02:00 siteb-2 pengine[189259]: warning: 

> >> >> Cluster node siteb-1 will be fenced: peer is no longer part of the 

> > cluster
> >> >> 2018-06-28T09:02:53.095673+02:00 siteb-2 pengine[189259]: warning: 

> > Node 
> >> >> siteb-1 is unclean
> >> >> 2018-06-28T09:02:53.097943+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> stonith-sbd_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098028+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> dlm:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098110+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> dlm:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098191+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> clvm:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098273+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> clvm:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098354+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> vg1:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098436+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> vg1:2_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098521+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> sysinfo:1_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098603+02:00 siteb-2 pengine[189259]: warning: 

> > Action
> >> 
> >> >> sysinfo:1_stop_0 on siteb-1 is unrunnable (offline)
> >> >> 2018-06-28T09:02:53.098683+02:00 siteb-2 pengine[189259]: warning: 

> >> >> Scheduling Node siteb-1 for STONITH
> >> >> 2018-06-28T09:02:53.098764+02:00 siteb-2 pengine[189259]: notice: 
* 
> > 
> >> >> Fence siteb-1
> >> >> 2018-06-28T09:02:53.098846+02:00 siteb-2 pengine[189259]: notice: 
> > Move 
> >> >> stonith-sbd#011(Started siteb-1 -> siteb-2)
> >> >> 2018-06-28T09:02:53.098928+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> dlm:0#011(sitea-1)
> >> >> 2018-06-28T09:02:53.099010+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> dlm:2#011(siteb-1)
> >> >> 2018-06-28T09:02:53.099094+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> clvm:2#011(siteb-1)
> >> >> 2018-06-28T09:02:53.099177+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> vg1:2#011(siteb-1)
> >> >> 2018-06-28T09:02:53.099258+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> sysinfo:1#011(siteb-1)
> >> >> 2018-06-28T09:02:53.099339+02:00 siteb-2 pengine[189259]: notice: 
> > Start 
> >> 
> >> >> k45RG#011(siteb-2)
> >> >> 2018-06-28T09:02:53.099715+02:00 siteb-2 pengine[189259]: warning: 

> >> >> Calculated transition 1057 (with warnings), saving inputs in 
> >> >> /var/lib/pacemaker/pengine/pe-warn-6.bz2
> >> >> 2018-06-28T09:02:53.100282+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Processing graph 1057 (ref=pe_calc-dc-1530169373-1354) derived 
from 
> >> >> /var/lib/pacemaker/pengine/pe-warn-6.bz2
> >> >> 2018-06-28T09:02:53.100407+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating start operation stonith-sbd_start_0 locally on siteb-2
> >> >> 2018-06-28T09:02:53.112921+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Requesting fencing (reboot) of node siteb-1
> >> >> 2018-06-28T09:02:53.113113+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Client crmd.189260.7d286b46 wants to fence (reboot) 'siteb-1' with 

> > device 
> >> >> '(any)'
> >> >> 2018-06-28T09:02:53.113308+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Merging stonith action reboot for node siteb-1 originating from 
> > client 
> >> >> crmd.189260.7b806ca3 with identical request from 
> >> >> [email protected] (72s)
> >> >> 2018-06-28T09:03:29.097551+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Operation reboot of siteb-1 by sitea-2 for 
> >> >> [email protected]: OK
> >> >> 2018-06-28T09:03:29.097669+02:00 siteb-2 stonith-ng[189256]: 
notice: 
> >> >> Operation reboot of siteb-1 by sitea-2 for 
> > [email protected]: 
> >> >> OK
> >> >> 2018-06-28T09:03:29.097756+02:00 siteb-2 crmd[189260]:   notice: 
Peer 
> > 
> >> >> siteb-1 was terminated (reboot) by sitea-2 on behalf of 
> >> >> stonith-api.171087: OK
> >> >> 2018-06-28T09:03:29.097863+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Transition aborted: External Fencing Operation
> >> >> 2018-06-28T09:03:29.097948+02:00 siteb-2 crmd[189260]:   notice: 
> > Stonith 
> >> >> operation 2/51:1057:0:1e551e09-fe55-4e9f-9447-a5199f92bf8c: OK (0)
> >> >> 2018-06-28T09:03:29.098036+02:00 siteb-2 crmd[189260]:   notice: 
Peer 
> > 
> >> >> siteb-1 was terminated (reboot) by sitea-2 on behalf of 
crmd.189260: 
> > OK
> >> >> 2018-06-28T09:03:29.098196+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Transition 1057 (Complete=14, Pending=0, Fired=0, Skipped=2, 
> > Incomplete=6,
> >> 
> >> >> Source=/var/lib/pacemaker/pengine/pe-warn-6.bz2): Stopped
> >> >> 2018-06-28T09:03:30.100556+02:00 siteb-2 dlm_controld[205418]: 
80917 
> > fence
> >> 
> >> >> status 2 receive 0 from 1 walltime 1530169410 local 80917
> >> >> 2018-06-28T09:03:30.101210+02:00 siteb-2 kernel: [80910.025422] 
dlm: 
> >> >> clvmd: dlm_recover 9
> >> >> 2018-06-28T09:03:30.101216+02:00 siteb-2 kernel: [80910.025444] 
dlm: 
> >> >> clvmd: prev removed member 1
> >> >> 2018-06-28T09:03:30.101216+02:00 siteb-2 kernel: [80910.025446] 
dlm: 
> >> >> clvmd: prev removed member 3
> >> >> 2018-06-28T09:03:30.101217+02:00 siteb-2 kernel: [80910.025446] 
dlm: 
> >> >> clvmd: remove member 2
> >> >> 2018-06-28T09:03:30.101217+02:00 siteb-2 kernel: [80910.025448] 
dlm: 
> >> >> clvmd: dlm_recover_members 1 nodes
> >> >> 2018-06-28T09:03:30.101219+02:00 siteb-2 kernel: [80910.025450] 
dlm: 
> >> >> clvmd: generation 8 slots 1 2:4
> >> >> 2018-06-28T09:03:30.101222+02:00 siteb-2 kernel: [80910.025452] 
dlm: 
> >> >> clvmd: dlm_recover_directory
> >> >> 2018-06-28T09:03:30.101223+02:00 siteb-2 kernel: [80910.025452] 
dlm: 
> >> >> clvmd: dlm_recover_directory 0 in 0 new
> >> >> 2018-06-28T09:03:30.101223+02:00 siteb-2 kernel: [80910.025454] 
dlm: 
> >> >> clvmd: dlm_recover_directory 0 out 0 messages
> >> >> 2018-06-28T09:03:30.101223+02:00 siteb-2 kernel: [80910.025455] 
dlm: 
> >> >> clvmd: dlm_recover_masters
> >> >> 2018-06-28T09:03:30.101224+02:00 siteb-2 kernel: [80910.025456] 
dlm: 
> >> >> clvmd: dlm_recover_masters 2 of 2
> >> >> 2018-06-28T09:03:30.101224+02:00 siteb-2 kernel: [80910.025457] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 out
> >> >> 2018-06-28T09:03:30.101226+02:00 siteb-2 kernel: [80910.025458] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 in
> >> >> 2018-06-28T09:03:30.101227+02:00 siteb-2 kernel: [80910.025459] 
dlm: 
> >> >> clvmd: dlm_recover_rsbs 2 done
> >> >> 2018-06-28T09:03:30.101227+02:00 siteb-2 kernel: [80910.025470] 
dlm: 
> >> >> clvmd: dlm_recover 9 generation 8 done: 0 ms
> >> >> 2018-06-28T09:03:30.103542+02:00 siteb-2 pengine[189259]: notice: 
> >> >> Watchdog will be used via SBD if fencing is required
> >> >> 2018-06-28T09:03:30.103660+02:00 siteb-2 pengine[189259]: notice: 
> > On 
> >> >> loss of CCM Quorum: Ignore
> >> >> 2018-06-28T09:03:30.106439+02:00 siteb-2 pengine[189259]: notice: 
> > Stop 
> >> >> dlm:0#011(sitea-1)
> >> >> 2018-06-28T09:03:30.106528+02:00 siteb-2 pengine[189259]: notice: 
> > Start 
> >> 
> >> >> k45RG#011(siteb-2)
> >> >> 2018-06-28T09:03:30.107002+02:00 siteb-2 pengine[189259]: notice: 
> >> >> Calculated transition 1058, saving inputs in 
> >> >> /var/lib/pacemaker/pengine/pe-input-2322.bz2
> >> >> 2018-06-28T09:03:30.107292+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Processing graph 1058 (ref=pe_calc-dc-1530169410-1356) derived 
from 
> >> >> /var/lib/pacemaker/pengine/pe-input-2322.bz2
> >> >> 2018-06-28T09:03:30.107384+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating start operation k45RG_start_0 locally on siteb-2
> >> >> 2018-06-28T09:03:30.117125+02:00 siteb-2 lrmd[189257]:   notice: 
> > executing
> >> 
> >> >> - rsc:k45RG action:start call_id:108
> >> >> 2018-06-28T09:03:30.117427+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating stop operation dlm_stop_0 on sitea-1
> >> >> 2018-06-28T09:03:44.279235+02:00 siteb-2 lrmd[189257]:   notice: 
> > finished 
> >> >> - rsc:k45RG action:start call_id:108 pid:254501 exit-code:0 
> >> >> exec-time:14162ms queue-time:0ms
> >> >> 2018-06-28T09:03:44.287804+02:00 siteb-2 crmd[189260]:   notice: 
> > Result of
> >> 
> >> >> start operation for k45RG on siteb-2: 0 (ok)
> >> >> 2018-06-28T09:03:44.289761+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Initiating monitor operation k45RG_monitor_300000 locally on 
siteb-2
> >> >> 2018-06-28T09:03:45.952776+02:00 siteb-2 crmd[189260]:   notice: 
> >> >> Transition 1058 (Complete=8, Pending=0, Fired=0, Skipped=0, 
> > Incomplete=0, 
> >> >> Source=/var/lib/pacemaker/pengine/pe-input-2322.bz2): Complete
> >> >> 2018-06-28T09:03:45.952976+02:00 siteb-2 crmd[189260]:   notice: 
> > State 
> >> >> transition S_TRANSITION_ENGINE -> S_IDLE
> >> >> -------------------
> >> >> Node sitea-1:
> >> >> .....
> >> >> 2018-06-28T09:02:23.504692+02:00 sitea-1 LVM(vg1)[170755]: INFO: 
> >> >> Deactivating volume group clustervg
> >> >> 2018-06-28T09:02:23.937007+02:00 sitea-1 multipathd[3567]: dm-12: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:23.937371+02:00 sitea-1 multipathd[3567]: dm-12: 
> > devmap 
> >> >> not registered, can't remove
> >> >> 2018-06-28T09:02:24.193367+02:00 sitea-1 kernel: [59480.678902] 
dlm: 
> >> >> clvmd: dlm_recover 13
> >> >> 2018-06-28T09:02:24.193379+02:00 sitea-1 kernel: [59480.678938] 
dlm: 
> >> >> clvmd: remove member 3
> >> >> 2018-06-28T09:02:24.393411+02:00 sitea-1 kernel: [59480.878296] 
dlm: 
> >> >> clvmd: dlm_recover_members 3 nodes
> >> >> 2018-06-28T09:02:24.593412+02:00 sitea-1 kernel: [59481.078227] 
dlm: 
> >> >> clvmd: generation 7 slots 3 1:2 2:4 3:1
> >> >> 2018-06-28T09:02:24.593423+02:00 sitea-1 kernel: [59481.078232] 
dlm: 
> >> >> clvmd: dlm_recover_directory
> >> >> 2018-06-28T09:02:24.793410+02:00 sitea-1 kernel: [59481.278263] 
dlm: 
> >> >> clvmd: dlm_recover_directory 0 in 0 new
> >> >> 2018-06-28T09:02:25.017465+02:00 sitea-1 multipathd[3567]: dm-12: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:25.017639+02:00 sitea-1 multipathd[3567]: dm-13: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:25.017724+02:00 sitea-1 multipathd[3567]: dm-13: 
> > devmap 
> >> >> not registered, can't remove
> >> >> 2018-06-28T09:02:25.017801+02:00 sitea-1 multipathd[3567]: dm-13: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:25.589352+02:00 sitea-1 kernel: [59482.074179] 
dlm: 
> >> >> clvmd: dlm_recover_directory 1 out 2 messages
> >> >> 2018-06-28T09:02:25.589357+02:00 sitea-1 kernel: [59482.074192] 
dlm: 
> >> >> clvmd: dlm_recover_masters
> >> >> 2018-06-28T09:02:25.589357+02:00 sitea-1 kernel: [59482.074201] 
dlm: 
> >> >> clvmd: dlm_recover_masters 1 of 3
> >> >> 2018-06-28T09:02:25.591207+02:00 sitea-1 LVM(vg1)[170755]: INFO: 0 

> >> >> logical volume(s) in volume group "clustervg" now active
> >> >> 2018-06-28T09:02:25.596699+02:00 sitea-1 LVM(vg1)[170755]: INFO: 
LVM 
> >> >> Volume clustervg is not available (stopped)
> >> >> 2018-06-28T09:02:25.600665+02:00 sitea-1 lrmd[9025]:   notice: 
> > finished - 
> >> >> rsc:vg1 action:stop call_id:80 pid:170755 exit-code:0 
> > exec-time:2315ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:25.600897+02:00 sitea-1 crmd[9031]:   notice: 
Result 
> > of 
> >> >> stop operation for vg1 on sitea-1: 0 (ok)
> >> >> 2018-06-28T09:02:25.604928+02:00 sitea-1 lrmd[9025]:   notice: 
> > executing -
> >> 
> >> >> rsc:clvm action:stop call_id:83
> >> >> 2018-06-28T09:02:25.617437+02:00 sitea-1 clvmd(clvm)[171004]: 
INFO: 
> >> >> Stopping clvm
> >> >> 2018-06-28T09:02:25.626327+02:00 sitea-1 clvmd(clvm)[171004]: 
INFO: 
> >> >> Stopping clvmd
> >> >> 2018-06-28T09:02:25.745414+02:00 sitea-1 kernel: [59482.230710] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 out
> >> >> 2018-06-28T09:02:26.389417+02:00 sitea-1 kernel: [59482.874050] 
dlm: 
> >> >> clvmd: dlm_recover_locks 0 in
> >> >> 2018-06-28T09:02:26.389429+02:00 sitea-1 kernel: [59482.874056] 
dlm: 
> >> >> clvmd: dlm_recover_rsbs 3 done
> >> >> 2018-06-28T09:02:26.988133+02:00 sitea-1 kernel: [59483.474046] 
dlm: 
> >> >> clvmd: dlm_recover 13 generation 7 done: 2396 ms
> >> >> 2018-06-28T09:02:26.988142+02:00 sitea-1 kernel: [59483.474240] 
dlm: 
> >> >> clvmd: leaving the lockspace group...
> >> >> 2018-06-28T09:02:26.988142+02:00 sitea-1 kernel: [59483.475630] 
dlm: 
> >> >> clvmd: group event done 0 0
> >> >> 2018-06-28T09:02:26.988143+02:00 sitea-1 kernel: [59483.476068] 
dlm: 
> >> >> clvmd: release_lockspace final free
> >> >> 2018-06-28T09:02:27.652248+02:00 sitea-1 clvmd(clvm)[171004]: 
INFO: 
> >> >> Stopping cmirrord
> >> >> 2018-06-28T09:02:28.666056+02:00 sitea-1 lrmd[9025]:   notice: 
> > finished - 
> >> >> rsc:clvm action:stop call_id:83 pid:171004 exit-code:0 
> > exec-time:3061ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:28.666439+02:00 sitea-1 crmd[9031]:   notice: 
Result 
> > of 
> >> >> stop operation for clvm on sitea-1: 0 (ok)
> >> >> 2018-06-28T09:02:38.413748+02:00 sitea-1 corosync[6661]:   [TOTEM 
] A 
> > 
> >> >> processor failed, forming new configuration.
> >> >> 2018-06-28T09:02:52.079905+02:00 sitea-1 corosync[6661]:   [TOTEM 
] A 
> > new 
> >> >> membership (192.168.121.55:2012) was formed. Members left: 2
> >> >> 2018-06-28T09:02:52.080306+02:00 sitea-1 corosync[6661]:   [TOTEM 
] 
> > Failed
> >> 
> >> >> to receive the leave message. failed: 2
> >> >> 2018-06-28T09:02:52.082619+02:00 sitea-1 cib[9021]:   notice: Node 

> > siteb-1
> >> 
> >> >> state is now lost
> >> >> 2018-06-28T09:02:52.082790+02:00 sitea-1 cib[9021]:   notice: 
Purged 
> > 1 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.082895+02:00 sitea-1 attrd[9026]:   notice: 
Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.082993+02:00 sitea-1 attrd[9026]:   notice: 
> > Removing 
> >> >> all siteb-1 attributes for peer loss
> >> >> 2018-06-28T09:02:52.083075+02:00 sitea-1 attrd[9026]:   notice: 
> > Purged 1 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.083156+02:00 sitea-1 stonith-ng[9022]: notice: 

> > Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.083253+02:00 sitea-1 stonith-ng[9022]: notice: 

> >> >> Purged 1 peer with id=2 and/or uname=siteb-1 from the membership 
> > cache
> >> >> 2018-06-28T09:02:52.083335+02:00 sitea-1 pacemakerd[8546]: notice: 

> > Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.083429+02:00 sitea-1 corosync[6661]: [QUORUM] 
> >> >> Members[3]: 1 3 4
> >> >> 2018-06-28T09:02:52.083521+02:00 sitea-1 corosync[6661]:   [MAIN ] 

> >> >> Completed service synchronization, ready to provide service.
> >> >> 2018-06-28T09:02:52.083606+02:00 sitea-1 crmd[9031]:   notice: 
Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.084290+02:00 sitea-1 dlm_controld[73416]: 
59514 
> > fence 
> >> >> request 2 pid 171087 nodedown time 1530169372 fence_all 
dlm_stonith
> >> >> 2018-06-28T09:02:52.085446+02:00 sitea-1 kernel: [59508.568940] 
dlm: 
> >> >> closing connection to node 2
> >> >> 2018-06-28T09:02:52.109393+02:00 sitea-1 dlm_stonith: 
> > stonith_api_time: 
> >> >> Found 0 entries for 2/(null): 0 in progress, 0 completed
> >> >> 2018-06-28T09:02:52.110167+02:00 sitea-1 stonith-ng[9022]: notice: 

> >> >> Client stonith-api.171087.d3c59fc2 wants to fence (reboot) '2' 
with 
> > device
> >> 
> >> >> '(any)'
> >> >> 2018-06-28T09:02:52.113257+02:00 sitea-1 stonith-ng[9022]: notice: 

> >> >> Requesting peer fencing (reboot) of siteb-1
> >> >> 2018-06-28T09:03:29.096714+02:00 sitea-1 stonith-ng[9022]: notice: 

> >> >> Operation reboot of siteb-1 by sitea-2 for 
> >> >> [email protected]: OK
> >> >> 2018-06-28T09:03:29.097152+02:00 sitea-1 stonith-api[171087]: 
> >> >> stonith_api_kick: Node 2/(null) kicked: reboot
> >> >> 2018-06-28T09:03:29.097426+02:00 sitea-1 crmd[9031]:   notice: 
Peer 
> >> >> lnx0361b was terminated (reboot) by sitea-2 on behalf of 
> >> >> stonith-api.171087: OK
> >> >> 2018-06-28T09:03:30.098657+02:00 sitea-1 dlm_controld[73416]: 
59552 
> > fence 
> >> >> result 2 pid 171087 result 0 exit status
> >> >> 2018-06-28T09:03:30.099730+02:00 sitea-1 dlm_controld[73416]: 
59552 
> > fence 
> >> >> status 2 receive 0 from 1 walltime 1530169410 local 59552
> >> >> 2018-06-28T09:03:30.118114+02:00 sitea-1 lrmd[9025]:   notice: 
> > executing -
> >> 
> >> >> rsc:dlm action:stop call_id:85
> >> >> 2018-06-28T09:03:30.149362+02:00 sitea-1 kernel: [59546.630487] 
dlm: 
> >> >> closing connection to node 4
> >> >> 2018-06-28T09:03:30.149370+02:00 sitea-1 kernel: [59546.630496] 
dlm: 
> >> >> closing connection to node 3
> >> >> 2018-06-28T09:03:30.149371+02:00 sitea-1 kernel: [59546.630501] 
dlm: 
> >> >> closing connection to node 1
> >> >> 2018-06-28T09:03:31.168356+02:00 sitea-1 lrmd[9025]:   notice: 
> > finished - 
> >> >> rsc:dlm action:stop call_id:85 pid:171235 exit-code:0 
> > exec-time:1050ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:03:31.168768+02:00 sitea-1 crmd[9031]:   notice: 
Result 
> > of 
> >> >> stop operation for dlm on lnx0361a: 0 (ok)
> >> >> 
> >> >> -------------------
> >> >> Node sitea-2:
> >> >> ...
> >> >> 2018-06-28T09:02:23.285402+02:00 sitea-2 lrmd[9657]:   notice: 
> > executing -
> >> 
> >> >> rsc:vg1 action:stop call_id:78
> >> >> 2018-06-28T09:02:23.286028+02:00 sitea-2 lrmd[9657]:   notice: 
> > executing -
> >> 
> >> >> rsc:sysinfo action:stop call_id:80
> >> >> 2018-06-28T09:02:23.293918+02:00 sitea-2 lrmd[9657]:   notice: 
> > finished - 
> >> >> rsc:sysinfo action:stop call_id:80 pid:228481 exit-code:0 
> > exec-time:8ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:23.294070+02:00 sitea-2 crmd[9660]:   notice: 
Result 
> > of 
> >> >> stop operation for sysinfo on sitea-2: 0 (ok)
> >> >> 2018-06-28T09:02:23.448868+02:00 sitea-2 LVM(vg1)[228480]: INFO: 
> >> >> Deactivating volume group clustervg
> >> >> 2018-06-28T09:02:23.494608+02:00 sitea-2 multipathd[3459]: dm-12: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:23.494774+02:00 sitea-2 multipathd[3459]: dm-12: 
> > devmap 
> >> >> not registered, can't remove
> >> >> 2018-06-28T09:02:24.147057+02:00 sitea-2 LVM(vg1)[228480]: INFO: 0 

> >> >> logical volume(s) in volume group "clustervg" now active
> >> >> 2018-06-28T09:02:24.152209+02:00 sitea-2 LVM(vg1)[228480]: INFO: 
LVM 
> >> >> Volume clustervg is not available (stopped)
> >> >> 2018-06-28T09:02:24.155824+02:00 sitea-2 lrmd[9657]:   notice: 
> > finished - 
> >> >> rsc:vg1 action:stop call_id:78 pid:228480 exit-code:0 
exec-time:871ms 
> > 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:24.156133+02:00 sitea-2 crmd[9660]:   notice: 
Result 
> > of 
> >> >> stop operation for vg1 on sitea-2: 0 (ok)
> >> >> 2018-06-28T09:02:24.159361+02:00 sitea-2 lrmd[9657]:   notice: 
> > executing -
> >> 
> >> >> rsc:clvm action:stop call_id:81
> >> >> 2018-06-28T09:02:24.169269+02:00 sitea-2 clvmd(clvm)[229478]: 
INFO: 
> >> >> Stopping clvm
> >> >> 2018-06-28T09:02:24.177998+02:00 sitea-2 clvmd(clvm)[229478]: 
INFO: 
> >> >> Stopping clvmd
> >> >> 2018-06-28T09:02:24.187070+02:00 sitea-2 kernel: [59548.133661] 
dlm: 
> >> >> clvmd: leaving the lockspace group...
> >> >> 2018-06-28T09:02:24.187078+02:00 sitea-2 kernel: [59548.134502] 
dlm: 
> >> >> clvmd: group event done 0 0
> >> >> 2018-06-28T09:02:24.187078+02:00 sitea-2 kernel: [59548.134863] 
dlm: 
> >> >> clvmd: release_lockspace final free
> >> >> 2018-06-28T09:02:24.707189+02:00 sitea-2 multipathd[3459]: dm-12: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:24.707584+02:00 sitea-2 multipathd[3459]: dm-13: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:24.707672+02:00 sitea-2 multipathd[3459]: dm-13: 
> > devmap 
> >> >> not registered, can't remove
> >> >> 2018-06-28T09:02:24.707754+02:00 sitea-2 multipathd[3459]: dm-13: 
> > remove 
> >> >> map (uevent)
> >> >> 2018-06-28T09:02:25.193751+02:00 sitea-2 clvmd(clvm)[229478]: 
INFO: 
> >> >> Stopping cmirrord
> >> >> 2018-06-28T09:02:26.204708+02:00 sitea-2 lrmd[9657]:   notice: 
> > finished - 
> >> >> rsc:clvm action:stop call_id:81 pid:229478 exit-code:0 
> > exec-time:2046ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:26.204942+02:00 sitea-2 crmd[9660]:   notice: 
Result 
> > of 
> >> >> stop operation for clvm on sitea-2: 0 (ok)
> >> >> 2018-06-28T09:02:26.208588+02:00 sitea-2 lrmd[9657]:   notice: 
> > executing -
> >> 
> >> >> rsc:dlm action:stop call_id:83
> >> >> 2018-06-28T09:02:26.236329+02:00 sitea-2 kernel: [59550.182284] 
dlm: 
> >> >> closing connection to node 4
> >> >> 2018-06-28T09:02:26.236336+02:00 sitea-2 kernel: [59550.182293] 
dlm: 
> >> >> closing connection to node 3
> >> >> 2018-06-28T09:02:26.236337+02:00 sitea-2 kernel: [59550.182299] 
dlm: 
> >> >> closing connection to node 2
> >> >> 2018-06-28T09:02:26.236337+02:00 sitea-2 kernel: [59550.182306] 
dlm: 
> >> >> closing connection to node 1
> >> >> 2018-06-28T09:02:27.248204+02:00 sitea-2 lrmd[9657]:   notice: 
> > finished - 
> >> >> rsc:dlm action:stop call_id:83 pid:230674 exit-code:0 
> > exec-time:1039ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:27.248462+02:00 sitea-2 crmd[9660]:   notice: 
Result 
> > of 
> >> >> stop operation for dlm on sitea-2: 0 (ok)
> >> >> 2018-06-28T09:02:30.454215+02:00 sitea-2 lrmd[9657]:   notice: 
> > finished - 
> >> >> rsc:k45RG action:stop call_id:76 pid:228479 exit-code:0 
> > exec-time:7169ms 
> >> >> queue-time:0ms
> >> >> 2018-06-28T09:02:30.454595+02:00 sitea-2 crmd[9660]:   notice: 
Result 
> > of 
> >> >> stop operation for k45RG on sitea-2: 0 (ok)
> >> >> 2018-06-28T09:02:38.412808+02:00 sitea-2 corosync[6570]:   [TOTEM 
] A 
> > 
> >> >> processor failed, forming new configuration.
> >> >> 2018-06-28T09:02:52.078249+02:00 sitea-2 corosync[6570]:   [TOTEM 
] A 
> > new 
> >> >> membership (192.168.121.55:2012) was formed. Members left: 2
> >> >> 2018-06-28T09:02:52.078359+02:00 sitea-2 corosync[6570]:   [TOTEM 
] 
> > Failed
> >> 
> >> >> to receive the leave message. failed: 2
> >> >> 2018-06-28T09:02:52.081949+02:00 sitea-2 cib[9655]:   notice: Node 

> > siteb-1
> >> 
> >> >> state is now lost
> >> >> 2018-06-28T09:02:52.082048+02:00 sitea-2 attrd[9658]:   notice: 
Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.082138+02:00 sitea-2 cib[9655]:   notice: 
Purged 
> > 1 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.082220+02:00 sitea-2 attrd[9658]:   notice: 
> > Removing 
> >> >> all siteb-1 attributes for peer loss
> >> >> 2018-06-28T09:02:52.082300+02:00 sitea-2 stonith-ng[9656]: notice: 

> > Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.082391+02:00 sitea-2 stonith-ng[9656]: notice: 

> >> >> Purged 1 peer with id=2 and/or uname=siteb-1 from the membership 
> > cache
> >> >> 2018-06-28T09:02:52.082470+02:00 sitea-2 attrd[9658]:   notice: 
> > Purged 1 
> >> >> peer with id=2 and/or uname=siteb-1 from the membership cache
> >> >> 2018-06-28T09:02:52.082653+02:00 sitea-2 corosync[6570]: [QUORUM] 
> >> >> Members[3]: 1 3 4
> >> >> 2018-06-28T09:02:52.082739+02:00 sitea-2 corosync[6570]:   [MAIN ] 

> >> >> Completed service synchronization, ready to provide service.
> >> >> 2018-06-28T09:02:52.082819+02:00 sitea-2 pacemakerd[9653]: notice: 

> > Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.082908+02:00 sitea-2 crmd[9660]:   notice: 
Node 
> >> >> siteb-1 state is now lost
> >> >> 2018-06-28T09:02:52.495697+02:00 sitea-2 stonith-ng[9656]: notice: 

> >> >> stonith-sbd can fence (reboot) siteb-1: dynamic-list
> >> >> 2018-06-28T09:02:52.495902+02:00 sitea-2 stonith-ng[9656]: notice: 

> >> >> Delaying reboot on stonith-sbd for 25358ms (timeout=300s)
> >> >> 2018-06-28T09:03:01.357937+02:00 sitea-2 cron[231205]: 
> >> >> pam_unix(crond:session): session opened for user root by (uid=0)
> >> >> 2018-06-28T09:03:29.093957+02:00 sitea-2 stonith-ng[9656]: notice: 

> >> >> Operation 'reboot' [231293] (call 2 from stonith-api.171087) for 
host 
> > 
> >> >> 'siteb-1' with device 'stonith-sbd' returned: 0 (OK)
> >> >> 2018-06-28T09:03:29.096254+02:00 sitea-2 stonith-ng[9656]: notice: 

> >> >> Operation reboot of siteb-1 by sitea-2 for 
> >> >> [email protected]: OK
> >> >> 2018-06-28T09:03:29.096769+02:00 sitea-2 crmd[9660]:   notice: 
Peer 
> >> >> siteb-1 was terminated (reboot) by sitea-2 on behalf of 
> >> >> stonith-api.171087: OK
> >> >> -------------------
> >> >> Node sideb-1 has no entries during this timeframe
> >> >> 
> >> >> during standby corosync should be up/running - so may the "Failed 
to 
> >> >> receive the leave message" will be a Problem?
> >> >> 
> >> >> regards
> >> >> Philipp
> >> >> 
> >> >>> Von: "Ulrich Windl" <[email protected]>
> >> >>> An: <[email protected]>
> >> >>> Datum: 16.07.2018 08:33
> >> >>> Betreff: [ClusterLabs] Antw:  corosync/dlm fencing?
> >> >>> Gesendet von: "Users" <[email protected]>
> >> >>> 
> >> >>> Hi!
> >> >>> 
> >> >>> I don't run SLES 12, but in SLES 11 some RAs tried to unload 
kernel 
> >> >> modules
> >> >>> during stop, and when you had installed kernel updates before, 
the 
> >> >> operation
> >> >>> failed, and the cluster paniced (i.e.: started to fence the 
node). 
> >> >>> In your case
> >> >>> it should be obvious from the logs why the cluster wants to fence 

> > nodes. 
> >> >> Maybe
> >> >>> "grep -i fence /var/log/messages" (or in pacemaker's log)
> >> >>> 
> >> >>> Regards,
> >> >>> Ulrich
> >> >>> 
> >> >>> >>> Philipp Achmüller <[email protected]> schrieb am 
> > 15.07.2018 
> >> >> um
> >> >>> 15:39 in
> >> >>> Nachricht 
> >> >> 
<[email protected]>:
> >> >>> > hi!
> >> >>> > 
> >> >>> > i have a 4 node cluster running on SLES12 SP3
> >> >>> > ‑ pacemaker‑1.1.16‑4.8.x86_64
> >> >>> > ‑ corosync‑2.3.6‑9.5.1.x86_64
> >> >>> > 
> >> >>> > following configuration:
> >> >>> > 
> >> >>> > Stack: corosync
> >> >>> > Current DC: sitea‑2 (version 1.1.16‑4.8‑77ea74d) ‑ 
partition 
> > with 
> >> >> quorum
> >> >>> > Last updated: Sun Jul 15 15:00:55 2018
> >> >>> > Last change: Sat Jul 14 18:54:50 2018 by root via crm_resource 
on 
> >> >> sitea‑1
> >> >>> > 
> >> >>> > 4 nodes configured
> >> >>> > 23 resources configured
> >> >>> > 
> >> >>> > Node sitea‑1: online
> >> >>> >         1       (ocf::pacemaker:controld):      Active 
> >> >>> >         1       (ocf::lvm2:clvmd):      Active 
> >> >>> >         1       (ocf::pacemaker:SysInfo):       Active 
> >> >>> >         5       (ocf::heartbeat:VirtualDomain): Active 
> >> >>> >         1       (ocf::heartbeat:LVM):   Active 
> >> >>> > Node siteb‑1: online
> >> >>> >         1       (ocf::pacemaker:controld):      Active 
> >> >>> >         1       (ocf::lvm2:clvmd):      Active 
> >> >>> >         1       (ocf::pacemaker:SysInfo):       Active 
> >> >>> >         1       (ocf::heartbeat:VirtualDomain): Active 
> >> >>> >         1       (ocf::heartbeat:LVM):   Active 
> >> >>> > Node sitea‑2: online
> >> >>> >         1       (ocf::pacemaker:controld):      Active 
> >> >>> >         1       (ocf::lvm2:clvmd):      Active 
> >> >>> >         1       (ocf::pacemaker:SysInfo):       Active 
> >> >>> >         3       (ocf::heartbeat:VirtualDomain): Active 
> >> >>> >         1       (ocf::heartbeat:LVM):   Active 
> >> >>> > Node siteb‑2: online
> >> >>> >         1       (ocf::pacemaker:ClusterMon):    Active 
> >> >>> >         3       (ocf::heartbeat:VirtualDomain): Active 
> >> >>> >         1       (ocf::pacemaker:SysInfo):       Active 
> >> >>> >         1       (stonith:external/sbd): Active 
> >> >>> >         1       (ocf::lvm2:clvmd):      Active 
> >> >>> >         1       (ocf::heartbeat:LVM):   Active 
> >> >>> >         1       (ocf::pacemaker:controld):      Active 
> >> >>> > ‑‑‑‑
> >> >>> > and these ordering:
> >> >>> > ...
> >> >>> > group base‑group dlm clvm vg1
> >> >>> > clone base‑clone base‑group \
> >> >>> >         meta interleave=true target‑role=Started ordered=true
> >> >>> > colocation colocation‑VM‑base‑clone‑INFINITY inf: VM 
> > base‑clone
> >> >>> > order order‑base‑clone‑VM‑mandatory base‑clone:start 
VM:start
> >> >>> > ...
> >> >>> > 
> >> >>> > for maintenance i would like to standby 1 or 2 nodes from 
"sitea" 
> > so 
> >> >> that 
> >> >>> > every Resources move off from these 2 images.
> >> >>> > everything works fine until dlm stops as last resource on these 

> > nodes, 
> >> >> 
> >> >>> > then dlm_controld send fence_request ‑ sometimes to the 
remaining 
> > 
> >> >> online 
> >> >>> > nodes, so there is online 1 node left in cluster....
> >> >>> > 
> >> >>> > messages:
> >> >>> > 
> >> >>> > ....
> >> >>> > 2018‑07‑14T14:38:56.441157+02:00 siteb‑1 
dlm_controld[39725]: 
> > 678 
> >> >> fence 
> >> >>> > request 3 pid 54428 startup time 1531571371 fence_all 
dlm_stonith
> >> >>> > 2018‑07‑14T14:38:56.445284+02:00 siteb‑1 dlm_stonith: 
> >> >> stonith_api_time: 
> >> >>> > Found 0 entries for 3/(null): 0 in progress, 0 completed
> >> >>> > 2018‑07‑14T14:38:56.446033+02:00 siteb‑1 stonith‑ng[8085]: 
> > notice: 
> >> >> 
> >> >>> > Client stonith‑api.54428.ee6a7e02 wants to fence (reboot) '3' 
> > with 
> >> >> device 
> >> >>> > '(any)'
> >> >>> > 2018‑07‑14T14:38:56.446294+02:00 siteb‑1 stonith‑ng[8085]: 
> > notice: 
> >> >> 
> >> >>> > Requesting peer fencing (reboot) of sitea‑2
> >> >>> > ...
> >> >>> > 
> >> >>> >  # dlm_tool dump_config
> >> >>> > daemon_debug=0
> >> >>> > foreground=0
> >> >>> > log_debug=0
> >> >>> > timewarn=0
> >> >>> > protocol=detect
> >> >>> > debug_logfile=0
> >> >>> > enable_fscontrol=0
> >> >>> > enable_plock=1
> >> >>> > plock_debug=0
> >> >>> > plock_rate_limit=0
> >> >>> > plock_ownership=0
> >> >>> > drop_resources_time=10000
> >> >>> > drop_resources_count=10
> >> >>> > drop_resources_age=10000
> >> >>> > post_join_delay=30
> >> >>> > enable_fencing=1
> >> >>> > enable_concurrent_fencing=0
> >> >>> > enable_startup_fencing=0
> >> >>> > repeat_failed_fencing=1
> >> >>> > enable_quorum_fencing=1
> >> >>> > enable_quorum_lockspace=1
> >> >>> > help=‑1
> >> >>> > version=‑1
> >> >>> > 
> >> >>> > how to find out what is happening here?
> >> >>> 
> >> >>> 
> >> >>> 
> >> >>> _______________________________________________
> >> >>> Users mailing list: [email protected] 
> >> >>> https://lists.clusterlabs.org/mailman/listinfo/users 
> >> >>> 
> >> >>> Project Home: http://www.clusterlabs.org 
> >> >>> Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> >> >>> Bugs: http://bugs.clusterlabs.org 
> >> > 
> >> > 
> >> > 
> >> > _______________________________________________
> >> > Users mailing list: [email protected] 
> >> > https://lists.clusterlabs.org/mailman/listinfo/users 
> >> > 
> >> > Project Home: http://www.clusterlabs.org 
> >> > Getting started: 
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> >> > Bugs: http://bugs.clusterlabs.org 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> Users mailing list: [email protected] 
> >> https://lists.clusterlabs.org/mailman/listinfo/users 
> >> 
> >> Project Home: http://www.clusterlabs.org 
> >> Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> >> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________
> Users mailing list: [email protected]
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Antwort: Antw: Antwort: Antw: Antw: Antwort: Antw: corosync/dlm fencing?

Reply via email to