On Fri, 2020-12-11 at 16:37 +0100, Gabriele Bulfon wrote: > I found I can do this temporarily: > > crm config property cib-bootstrap-options: no-quorum-policy=ignore > > then once node 2 is up again: > > crm config property cib-bootstrap-options: no-quorum-policy=stop > > so that I make sure nodes will not mount in another strange > situation. > > Is there any better way? (such as ignore until everything is back to > normal then conisder top again) > > Gabriele
When node 2 is known to be down and staying down, I'd probably disable wait_for_all in corosync on node 1, start the cluster on node 1, then re-enable wait_for_all on node 1 (either immediately, or right before I'm ready to return node 2 to the cluster, depending on how long that might be). If a third host is available for a lightweight process, qdevice would be another option. > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > Da: Gabriele Bulfon <gbul...@sonicle.com> > A: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Data: 11 dicembre 2020 15.51.28 CET > Oggetto: Re: [ClusterLabs] Antw: [EXT] Recoveing from node failure > > > > > > I cannot "use wait_for_all: 0", cause this would move automatically > > a powered off node from UNCLEAN to OFFLINE and mount the ZFS pool > > (total risk!): I want to manually move from UNCLEAN to OFFLINE, > > when I know that 2nd node is actually off! > > > > Actually with wait_for_all to default (1) that was the case, so > > node1 would wait for my intervention when booting and node2 is > > down. > > So what think I need is some way to manually override the quorum in > > such a case (node 2 down for maintenance, node 1 reboot), so I > > would manually turn OFFLINE node2 from UNCLEAN, manually override > > quorum and have zpool mount and NFS ip up. > > > > Any idea? > > > > > > Sonicle S.r.l. : http://www.sonicle.com > > Music: http://www.gabrielebulfon.com > > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > > > > > ----------------------------------------------------------------- > > ----------------- > > > > Da: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> > > A: users@clusterlabs.org > > Data: 11 dicembre 2020 11.35.44 CET > > Oggetto: [ClusterLabs] Antw: [EXT] Recoveing from node failure > > > > > Hi! > > > > > > Did you take care for special "two node" settings (quorum I > > > mean)? > > > When I use "crm_mon -1Arfj", I see something like > > > " * Current DC: h19 (version 2.0.4+20200616.2deceaa3a-3.3.1- > > > 2.0.4+20200616.2deceaa3a) - partition with quorum" > > > > > > What do you see? > > > > > > Regards, > > > Ulrich > > > > > > >>> Gabriele Bulfon <gbul...@sonicle.com> schrieb am 11.12.2020 > > > um 11:23 in > > > Nachricht <350849824.6300.1607682209284@www>: > > > > Hi, I finally could manage stonith with IPMI in my 2 nodes > > > XStreamOS/illumos > > > > storage cluster. > > > > I have NFS IPs and shared storage zpool moving from one node or > > > the other, > > > > and stonith controllin ipmi powering off when something is not > > > clear. > > > > > > > > What happens now is that if I shutdown 2nd node, I see the > > > OFFLINE status > > > > from node 1 and everything is up and running, and this is ok: > > > > > > > > Online: [ xstha1 ] > > > > OFFLINE: [ xstha2 ] > > > > Full list of resources: > > > > xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1 > > > > xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1 > > > > xstha1-stonith (stonith:external/ipmi): Started xstha1 > > > > xstha2-stonith (stonith:external/ipmi): Started xstha1 > > > > zpool_data (ocf::heartbeat:ZFS): Started xstha1 > > > > But if also reboot 1st node, it starts with the UNCLEAN state, > > > nothing is > > > > running, so I clearstate of node 2, but resources are not > > > started: > > > > > > > > Online: [ xstha1 ] > > > > OFFLINE: [ xstha2 ] > > > > Full list of resources: > > > > xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped > > > > xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped > > > > xstha1-stonith (stonith:external/ipmi): Stopped > > > > xstha2-stonith (stonith:external/ipmi): Stopped > > > > zpool_data (ocf::heartbeat:ZFS): Stopped > > > > I tried restarting zpool_data or other resources: > > > > # crm resource start zpool_data > > > > but nothing happens! > > > > How can I recover from this state? Node2 needs to stay down, > > > but I want > > > > node1 to work. > > > > Thanks! > > > > Gabriele > > > > > > > > > > > > Sonicle S.r.l. : http://www.sonicle.com > > > > Music: http://www.gabrielebulfon.com > > > > eXoplanets : > > > https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/