hi! i have a 4 node cluster running on SLES12 SP3 - pacemaker-1.1.16-4.8.x86_64 - corosync-2.3.6-9.5.1.x86_64
following configuration: Stack: corosync Current DC: sitea-2 (version 1.1.16-4.8-77ea74d) - partition with quorum Last updated: Sun Jul 15 15:00:55 2018 Last change: Sat Jul 14 18:54:50 2018 by root via crm_resource on sitea-1 4 nodes configured 23 resources configured Node sitea-1: online 1 (ocf::pacemaker:controld): Active 1 (ocf::lvm2:clvmd): Active 1 (ocf::pacemaker:SysInfo): Active 5 (ocf::heartbeat:VirtualDomain): Active 1 (ocf::heartbeat:LVM): Active Node siteb-1: online 1 (ocf::pacemaker:controld): Active 1 (ocf::lvm2:clvmd): Active 1 (ocf::pacemaker:SysInfo): Active 1 (ocf::heartbeat:VirtualDomain): Active 1 (ocf::heartbeat:LVM): Active Node sitea-2: online 1 (ocf::pacemaker:controld): Active 1 (ocf::lvm2:clvmd): Active 1 (ocf::pacemaker:SysInfo): Active 3 (ocf::heartbeat:VirtualDomain): Active 1 (ocf::heartbeat:LVM): Active Node siteb-2: online 1 (ocf::pacemaker:ClusterMon): Active 3 (ocf::heartbeat:VirtualDomain): Active 1 (ocf::pacemaker:SysInfo): Active 1 (stonith:external/sbd): Active 1 (ocf::lvm2:clvmd): Active 1 (ocf::heartbeat:LVM): Active 1 (ocf::pacemaker:controld): Active ---- and these ordering: ... group base-group dlm clvm vg1 clone base-clone base-group \ meta interleave=true target-role=Started ordered=true colocation colocation-VM-base-clone-INFINITY inf: VM base-clone order order-base-clone-VM-mandatory base-clone:start VM:start ... for maintenance i would like to standby 1 or 2 nodes from "sitea" so that every Resources move off from these 2 images. everything works fine until dlm stops as last resource on these nodes, then dlm_controld send fence_request - sometimes to the remaining online nodes, so there is online 1 node left in cluster.... messages: .... 2018-07-14T14:38:56.441157+02:00 siteb-1 dlm_controld[39725]: 678 fence request 3 pid 54428 startup time 1531571371 fence_all dlm_stonith 2018-07-14T14:38:56.445284+02:00 siteb-1 dlm_stonith: stonith_api_time: Found 0 entries for 3/(null): 0 in progress, 0 completed 2018-07-14T14:38:56.446033+02:00 siteb-1 stonith-ng[8085]: notice: Client stonith-api.54428.ee6a7e02 wants to fence (reboot) '3' with device '(any)' 2018-07-14T14:38:56.446294+02:00 siteb-1 stonith-ng[8085]: notice: Requesting peer fencing (reboot) of sitea-2 ... # dlm_tool dump_config daemon_debug=0 foreground=0 log_debug=0 timewarn=0 protocol=detect debug_logfile=0 enable_fscontrol=0 enable_plock=1 plock_debug=0 plock_rate_limit=0 plock_ownership=0 drop_resources_time=10000 drop_resources_count=10 drop_resources_age=10000 post_join_delay=30 enable_fencing=1 enable_concurrent_fencing=0 enable_startup_fencing=0 repeat_failed_fencing=1 enable_quorum_fencing=1 enable_quorum_lockspace=1 help=-1 version=-1 how to find out what is happening here?
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org