On Tue, 2018-06-19 at 16:17 +0200, Stefan Krueger wrote: > Hi Ken, > > thanks for help! > I create a stonith-device and delete the no-quorum-policy. > > It doesn't change anything, so I delete the orders, (co)locations and > one ressource (nfs-server). at first it works fine but when I stop a > cluster via 'pcs cluster stop' it takes infinity time, it looks like > it has an problem with the nfs server so I tried to stop them > manuelly via systemctl stop nfs-server, but it didn't change anything > - the nfs-server won't stop. So I did a reset the server, now > everything should move to the other node but it also didn't happen :( > > Manually I can start/stop the nfs-server without any problems (nobody > mount the nfs-share yet): > systemctl start nfs-server.service ; sleep 5; systemctl status nfs- > server.service ; sleep 5; systemctl stop nfs-server > > so, again my ressources won't start > pcs status > Cluster name: zfs-vmstorage > Stack: corosync > Current DC: zfs-serv3 (version 1.1.16-94ff4df) - partition with > quorum > Last updated: Tue Jun 19 16:15:37 2018 > Last change: Tue Jun 19 15:41:24 2018 by hacluster via crmd on zfs- > serv4 > > 2 nodes configured > 5 resources configured > > Online: [ zfs-serv3 zfs-serv4 ] > > Full list of resources: > > vm_storage (ocf::heartbeat:ZFS): Stopped > ha-ip (ocf::heartbeat:IPaddr2): Stopped > resIPMI-zfs4 (stonith:external/ipmi): Started zfs-serv3 > resIPMI-zfs3 (stonith:external/ipmi): Started zfs-serv4 > nfs-server (systemd:nfs-server): Stopped
I'd check the logs for more information. It's odd that status doesn't show any failures, which suggests the cluster didn't schedule any actions. The system log will have the most essential information. The detail log (usually /var/log/pacemaker.log or /var/log/cluster/corosync.log) will have extended information. The most interesting will be messages from the pengine with actions to be scheduled ("Start", etc.). Then there should be messages from the crmd about "Initiating" the command and obtaining its "Result". > > Daemon Status: > corosync: active/enabled > pacemaker: active/enabled > pcsd: active/enabled > > > > > pcs config > Cluster Name: zfs-vmstorage > Corosync Nodes: > zfs-serv3 zfs-serv4 > Pacemaker Nodes: > zfs-serv3 zfs-serv4 > > Resources: > Resource: vm_storage (class=ocf provider=heartbeat type=ZFS) > Attributes: pool=vm_storage importargs="-d /dev/disk/by-vdev/" > Operations: monitor interval=5s timeout=30s (vm_storage-monitor- > interval-5s) > start interval=0s timeout=90 (vm_storage-start- > interval-0s) > stop interval=0s timeout=90 (vm_storage-stop-interval- > 0s) > Resource: ha-ip (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=172.16.101.73 cidr_netmask=16 > Operations: start interval=0s timeout=20s (ha-ip-start-interval-0s) > stop interval=0s timeout=20s (ha-ip-stop-interval-0s) > monitor interval=10s timeout=20s (ha-ip-monitor- > interval-10s) > Resource: nfs-server (class=systemd type=nfs-server) > Operations: start interval=0s timeout=100 (nfs-server-start- > interval-0s) > stop interval=0s timeout=100 (nfs-server-stop-interval- > 0s) > monitor interval=60 timeout=100 (nfs-server-monitor- > interval-60) > > Stonith Devices: > Resource: resIPMI-zfs4 (class=stonith type=external/ipmi) > Attributes: hostname=ipmi-zfs-serv4 ipaddr=172.xx.xx.17 userid=USER > passwd=GEHEIM interface=lan > Operations: monitor interval=60s (resIPMI-zfs4-monitor-interval- > 60s) > Resource: resIPMI-zfs3 (class=stonith type=external/ipmi) > Attributes: hostname=ipmi-zfs-serv3 ipaddr=172.xx.xx.16 userid=USER > passwd=GEHEIM interface=lan > Operations: monitor interval=60s (resIPMI-zfs3-monitor-interval- > 60s) > Fencing Levels: > > Location Constraints: > Resource: resIPMI-zfs3 > Disabled on: zfs-serv3 (score:-INFINITY) (id:location-resIPMI- > zfs3-zfs-serv3--INFINITY) > Resource: resIPMI-zfs4 > Disabled on: zfs-serv4 (score:-INFINITY) (id:location-resIPMI- > zfs4-zfs-serv4--INFINITY) > Ordering Constraints: > Resource Sets: > set nfs-server vm_storage ha-ip action=start (id:pcs_rsc_set_nfs- > server_vm_storage_ha-ip) (id:pcs_rsc_order_set_nfs- > server_vm_storage_ha-ip) > set ha-ip nfs-server vm_storage action=stop (id:pcs_rsc_set_ha- > ip_nfs-server_vm_storage) (id:pcs_rsc_order_set_ha-ip_nfs- > server_vm_storage) > Colocation Constraints: > Resource Sets: > set ha-ip nfs-server vm_storage (id:colocation-ha-ip-nfs-server- > INFINITY-0) setoptions score=INFINITY (id:colocation-ha-ip-nfs- > server-INFINITY) I don't think your constraints are causing problems, but sets can be difficult to follow. Your ordering/colocation constraints could be more simply expressed as a group of nfs-server vm_storage ha-ip. With a group, the cluster will do both ordering and colocation, in forward order for start, and reverse order for stop. > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > resource-stickiness: 100 > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: zfs-vmstorage > dc-version: 1.1.16-94ff4df > have-watchdog: false > last-lrm-refresh: 1528814481 > no-quorum-policy: stop > stonith-enabled: false ^^^ You have to explicitly set stonith-enabled to true since it was set to false earlier BTW, IPMI is a good fencing method, but it has a problem if it's on- board. If the host loses power entirely, IPMI will not respond, the fencing will fail, and the cluster will be unable to recover. On-board IPMI requires a back-up method such as an intelligent power switch or sdb. > > Quorum: > Options: > > > > thanks for help! > best regards > Stefan > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org