Re: [ClusterLabs] questions about startup fencing

2017-12-06 Thread Adam Spiers
Ken Gaillot  wrote: 
On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote: 
Ken Gaillot  wrote: 
On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote: 


[snipped]

Let's suppose further that the cluster configuration is such 
that no stateful resources which could potentially conflict 
with other nodes will ever get launched on that 5th node.  For 
example it might only host stateless clones, or resources with 
require=nothing set, or it might not even host any resources at 
all due to some temporary constraints which have been applied. 

In those cases, what is to be gained from fencing?  The only 
thing I can think of is that using (say) IPMI to power-cycle 
the node *might* fix whatever issue was preventing it from 
joining the cluster.  Are there any other reasons for fencing 
in this case?  It wouldn't help avoid any data corruption, at 
least. 


Just because constraints are telling the node it can't run a 
resource doesn't mean the node isn't malfunctioning and running 
it anyway.  If the node can't tell us it's OK, we have to assume 
it's not. 


Sure, but even if it *is* running it, if it's not conflicting with 
anything or doing any harm, is it really always better to fence 
regardless? 


There's a resource meta-attribute "requires" that says what a resource 
needs to start. If it can't do any harm if it runs awry, you can set 
requires="quorum" (or even "nothing"). 

So, that's sort of a way to let the cluster know that, but it doesn't 
currently do what you're suggesting, since start-up fencing is purely 
about the node and not about the resources. I suppose if the cluster 
had no resources requiring fencing (or, to push it further, no such 
resources that will be probed on that node), we could disable start-up 
fencing, but that's not done currently. 


Yeah, that's the kind of thing I was envisaging. 

Disclaimer: to a certain extent I'm playing devil's advocate here to 
stimulate a closer (re-)examination of the axiom we've grown so used 
to over the years that if we don't know what a node is doing, we 
should fence it.  I'm not necessarily arguing that fencing is wrong 
here, but I think it's healthy to occasionally go back to first 
principles and re-question why we are doing things a certain way, to 
make sure that the original assumptions still hold true.  I'm 
familiar with the pain that our customers experience when nodes are 
fenced for less than very compelling reasons, so I think it's worth 
looking for opportunities to reduce fencing to when it's really 
needed.


The fundamental purpose of a high-availability cluster is to keep the 
desired service functioning, above all other priorities (including, 
unfortunately, making sysadmins' lives easier). 

If a service requires an HA cluster, it's a safe bet it will have 
problems in a split-brain situation (otherwise, why bother with the 
overhead). Even something as simple as an IP address will render a 
service useless if it's brought up on two machines on a network. 

Fencing is really the only hammer we have in that situation. At that 
point, we have zero information about what the node is doing. If it's 
powered off (or cut off from disk/network), we know it's not doing 
anything.


Fencing may not always help the situation, but it's all we've got. 


Sure, but I'm not (necessarily) even talking about a split-brain 
situation.  For example what if a cluster with remote nodes is shut 
down cleanly, and then all the core nodes boot up cleanly but none of 
the remote nodes are powered on till hours or even days later? 

If I understand Yan correctly, in this situation all the remotes will 
be marked as needing fencing, and this is the bit that doesn't make 
sense to me.  If Pacemaker can't reach *any* remotes, it can't start 
any resources on those remotes, so (in the case where resources are 
partitioned cleanly into those which run on remotes vs. those which 
don't) there is no danger of any concurrency violation.  So fencing 
remotes before you can use them is totally pointless.  Surely fencing 
of node A should only happen when Pacemaker is ready to start resource 
X on node B which might already be running on node A.  But if no such 
node B exists then fencing is overkill.  It would be better to wait 
until the first remote joins the cluster, at which point Pacemaker can 
assess its current state and decide the best course of action. 
Otherwise it's like cutting your nose to spite your face. 

In fact, in the particular scenario which caused me to trigger this 
whole discussion, I suspect the above also applies even if some 
remotes joined the newly booted cluster quickly whilst others still 
take hours or days to boot - because in that scenario it is 
additionally safe to assume that none of the resources managed on 
those remotes by pacemaker_remoted would ever be started by anything 
other than pacemaker_remoted, since a) the whole system is configured 
automatically in a way which 

Re: [ClusterLabs] questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Fri, 2017-12-01 at 16:21 -0600, Ken Gaillot wrote:
> On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote:
> > Ken Gaillot  wrote:
> > > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> > > > Hi all,
> > > > 
> > > > A colleague has been valiantly trying to help me belatedly
> > > > learn
> > > > about
> > > > the intricacies of startup fencing, but I'm still not fully
> > > > understanding some of the finer points of the behaviour.
> > > > 
> > > > The documentation on the "startup-fencing" option[0] says
> > > > 
> > > > Advanced Use Only: Should the cluster shoot unseen nodes?
> > > > Not
> > > > using the default is very unsafe!
> > > > 
> > > > and that it defaults to TRUE, but doesn't elaborate any
> > > > further:
> > > > 
> > > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/
> > > > Pa
> > > > cema
> > > > ker_Explained/s-cluster-options.html
> > > > 
> > > > Let's imagine the following scenario:
> > > > 
> > > > - We have a 5-node cluster, with all nodes running cleanly.
> > > > 
> > > > - The whole cluster is shut down cleanly.
> > > > 
> > > > - The whole cluster is then started up again.  (Side question:
> > > > what
> > > >   happens if the last node to shut down is not the first to
> > > > start
> > > > up?
> > > >   How will the cluster ensure it has the most recent version of
> > > > the
> > > >   CIB?  Without that, how would it know whether the last man
> > > > standing
> > > >   was shut down cleanly or not?)
> > > 
> > > Of course, the cluster can't know what CIB version nodes it
> > > doesn't
> > > see
> > > have, so if a set of nodes is started with an older version, it
> > > will go
> > > with that.
> > 
> > Right, that's what I expected.
> > 
> > > However, a node can't do much without quorum, so it would be
> > > difficult
> > > to get in a situation where CIB changes were made with quorum
> > > before
> > > shutdown, but none of those nodes are present at the next start-
> > > up
> > > with
> > > quorum.
> > > 
> > > In any case, when a new node joins a cluster, the nodes do
> > > compare
> > > CIB
> > > versions. If the new node has a newer CIB, the cluster will use
> > > it.
> > > If
> > > other changes have been made since then, the newest CIB wins, so
> > > one or
> > > the other's changes will be lost.
> > 
> > Ahh, that's interesting.  Based on reading
> > 
> > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pace
> > ma
> > ker_Explained/ch03.html#_cib_properties
> > 
> > whichever node has the highest (admin_epoch, epoch, num_updates)
> > tuple
> > will win, so normally in this scenario it would be the epoch which
> > decides it, i.e. whichever node had the most changes since the last
> > time the conflicting nodes shared the same config - right?
> 
> Correct ... assuming the code for that is working properly, which I
> haven't confirmed :)
> 
> > 
> > And if that would choose the wrong node, admin_epoch can be set
> > manually to override that decision?
> 
> Correct again, with same caveat
> 
> > 
> > > Whether missing nodes were shut down cleanly or not relates to
> > > your
> > > next question ...
> > > 
> > > > - 4 of the nodes boot up fine and rejoin the cluster within the
> > > >   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> > > > 
> > > > IIUC, with startup-fencing enabled, this will result in that
> > > > 5th
> > > > node
> > > > automatically being fenced.  If I'm right, is that really
> > > > *always*
> > > > necessary?
> > > 
> > > It's always safe. :-) As you mentioned, if the missing node was
> > > the
> > > last one alive in the previous run, the cluster can't know
> > > whether
> > > it
> > > shut down cleanly or not. Even if the node was known to shut down
> > > cleanly in the last run, the cluster still can't know whether the
> > > node
> > > was started since then and is now merely unreachable. So, fencing
> > > is
> > > necessary to ensure it's not accessing resources.
> > 
> > I get that, but I was questioning the "necessary to ensure it's not
> > accessing resources" part of this statement.  My point is that
> > sometimes this might be overkill, because sometimes we might be
> > able
> > to
> > discern through other methods that there are no resources we need
> > to
> > worry about potentially conflicting with what we want to
> > run.  That's
> > why I gave the stateless clones example.
> > 
> > > The same scenario is why a single node can't have quorum at
> > > start-
> > > up in
> > > a cluster with "two_node" set. Both nodes have to see each other
> > > at
> > > least once before they can assume it's safe to do anything.
> > 
> > Yep.
> > 
> > > > Let's suppose further that the cluster configuration is such
> > > > that
> > > > no
> > > > stateful resources which could potentially conflict with other
> > > > nodes
> > > > will ever get launched on that 5th node.  For example it might
> > > > only
> > > > host stateless clones, or resources with 

Re: [ClusterLabs] questions about startup fencing

2017-12-01 Thread Ken Gaillot
On Thu, 2017-11-30 at 11:58 +, Adam Spiers wrote:
> Ken Gaillot  wrote:
> > On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> > > Hi all,
> > > 
> > > A colleague has been valiantly trying to help me belatedly learn
> > > about
> > > the intricacies of startup fencing, but I'm still not fully
> > > understanding some of the finer points of the behaviour.
> > > 
> > > The documentation on the "startup-fencing" option[0] says
> > > 
> > > Advanced Use Only: Should the cluster shoot unseen nodes? Not
> > > using the default is very unsafe!
> > > 
> > > and that it defaults to TRUE, but doesn't elaborate any further:
> > > 
> > > https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pa
> > > cema
> > > ker_Explained/s-cluster-options.html
> > > 
> > > Let's imagine the following scenario:
> > > 
> > > - We have a 5-node cluster, with all nodes running cleanly.
> > > 
> > > - The whole cluster is shut down cleanly.
> > > 
> > > - The whole cluster is then started up again.  (Side question:
> > > what
> > >   happens if the last node to shut down is not the first to start
> > > up?
> > >   How will the cluster ensure it has the most recent version of
> > > the
> > >   CIB?  Without that, how would it know whether the last man
> > > standing
> > >   was shut down cleanly or not?)
> > 
> > Of course, the cluster can't know what CIB version nodes it doesn't
> > see
> > have, so if a set of nodes is started with an older version, it
> > will go
> > with that.
> 
> Right, that's what I expected.
> 
> > However, a node can't do much without quorum, so it would be
> > difficult
> > to get in a situation where CIB changes were made with quorum
> > before
> > shutdown, but none of those nodes are present at the next start-up
> > with
> > quorum.
> > 
> > In any case, when a new node joins a cluster, the nodes do compare
> > CIB
> > versions. If the new node has a newer CIB, the cluster will use it.
> > If
> > other changes have been made since then, the newest CIB wins, so
> > one or
> > the other's changes will be lost.
> 
> Ahh, that's interesting.  Based on reading
> 
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema
> ker_Explained/ch03.html#_cib_properties
> 
> whichever node has the highest (admin_epoch, epoch, num_updates)
> tuple
> will win, so normally in this scenario it would be the epoch which
> decides it, i.e. whichever node had the most changes since the last
> time the conflicting nodes shared the same config - right?

Correct ... assuming the code for that is working properly, which I
haven't confirmed :)

> 
> And if that would choose the wrong node, admin_epoch can be set
> manually to override that decision?

Correct again, with same caveat

> 
> > Whether missing nodes were shut down cleanly or not relates to your
> > next question ...
> > 
> > > - 4 of the nodes boot up fine and rejoin the cluster within the
> > >   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> > > 
> > > IIUC, with startup-fencing enabled, this will result in that 5th
> > > node
> > > automatically being fenced.  If I'm right, is that really
> > > *always*
> > > necessary?
> > 
> > It's always safe. :-) As you mentioned, if the missing node was the
> > last one alive in the previous run, the cluster can't know whether
> > it
> > shut down cleanly or not. Even if the node was known to shut down
> > cleanly in the last run, the cluster still can't know whether the
> > node
> > was started since then and is now merely unreachable. So, fencing
> > is
> > necessary to ensure it's not accessing resources.
> 
> I get that, but I was questioning the "necessary to ensure it's not
> accessing resources" part of this statement.  My point is that
> sometimes this might be overkill, because sometimes we might be able
> to
> discern through other methods that there are no resources we need to
> worry about potentially conflicting with what we want to run.  That's
> why I gave the stateless clones example.
> 
> > The same scenario is why a single node can't have quorum at start-
> > up in
> > a cluster with "two_node" set. Both nodes have to see each other at
> > least once before they can assume it's safe to do anything.
> 
> Yep.
> 
> > > Let's suppose further that the cluster configuration is such that
> > > no
> > > stateful resources which could potentially conflict with other
> > > nodes
> > > will ever get launched on that 5th node.  For example it might
> > > only
> > > host stateless clones, or resources with require=nothing set, or
> > > it
> > > might not even host any resources at all due to some temporary
> > > constraints which have been applied.
> > > 
> > > In those cases, what is to be gained from fencing?  The only
> > > thing I
> > > can think of is that using (say) IPMI to power-cycle the node
> > > *might*
> > > fix whatever issue was preventing it from joining the
> > > cluster.  Are
> > > there any other reasons for fencing in this 

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Adam Spiers

Ken Gaillot  wrote:

On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:

Hi all,

A colleague has been valiantly trying to help me belatedly learn
about
the intricacies of startup fencing, but I'm still not fully
understanding some of the finer points of the behaviour.

The documentation on the "startup-fencing" option[0] says

Advanced Use Only: Should the cluster shoot unseen nodes? Not
using the default is very unsafe!

and that it defaults to TRUE, but doesn't elaborate any further:

https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema
ker_Explained/s-cluster-options.html

Let's imagine the following scenario:

- We have a 5-node cluster, with all nodes running cleanly.

- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)


Of course, the cluster can't know what CIB version nodes it doesn't see
have, so if a set of nodes is started with an older version, it will go
with that.


Right, that's what I expected.


However, a node can't do much without quorum, so it would be difficult
to get in a situation where CIB changes were made with quorum before
shutdown, but none of those nodes are present at the next start-up with
quorum.

In any case, when a new node joins a cluster, the nodes do compare CIB
versions. If the new node has a newer CIB, the cluster will use it. If
other changes have been made since then, the newest CIB wins, so one or
the other's changes will be lost.


Ahh, that's interesting.  Based on reading

   
https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ch03.html#_cib_properties

whichever node has the highest (admin_epoch, epoch, num_updates) tuple
will win, so normally in this scenario it would be the epoch which
decides it, i.e. whichever node had the most changes since the last
time the conflicting nodes shared the same config - right?

And if that would choose the wrong node, admin_epoch can be set
manually to override that decision?


Whether missing nodes were shut down cleanly or not relates to your
next question ...


- 4 of the nodes boot up fine and rejoin the cluster within the
  dc-deadtime interval, foruming a quorum, but the 5th doesn't.

IIUC, with startup-fencing enabled, this will result in that 5th node
automatically being fenced.  If I'm right, is that really *always*
necessary?


It's always safe. :-) As you mentioned, if the missing node was the
last one alive in the previous run, the cluster can't know whether it
shut down cleanly or not. Even if the node was known to shut down
cleanly in the last run, the cluster still can't know whether the node
was started since then and is now merely unreachable. So, fencing is
necessary to ensure it's not accessing resources.


I get that, but I was questioning the "necessary to ensure it's not
accessing resources" part of this statement.  My point is that
sometimes this might be overkill, because sometimes we might be able to
discern through other methods that there are no resources we need to
worry about potentially conflicting with what we want to run.  That's
why I gave the stateless clones example.


The same scenario is why a single node can't have quorum at start-up in
a cluster with "two_node" set. Both nodes have to see each other at
least once before they can assume it's safe to do anything.


Yep.


Let's suppose further that the cluster configuration is such that no
stateful resources which could potentially conflict with other nodes
will ever get launched on that 5th node.  For example it might only
host stateless clones, or resources with require=nothing set, or it
might not even host any resources at all due to some temporary
constraints which have been applied.

In those cases, what is to be gained from fencing?  The only thing I
can think of is that using (say) IPMI to power-cycle the node *might*
fix whatever issue was preventing it from joining the cluster.  Are
there any other reasons for fencing in this case?  It wouldn't help
avoid any data corruption, at least.


Just because constraints are telling the node it can't run a resource
doesn't mean the node isn't malfunctioning and running it anyway. If
the node can't tell us it's OK, we have to assume it's not.


Sure, but even if it *is* running it, if it's not conflicting with
anything or doing any harm, is it really always better to fence
regardless?

Disclaimer: to a certain extent I'm playing devil's advocate here to
stimulate a closer (re-)examination of the axiom we've grown so used
to over the years that if we don't know what a node is doing, we
should fence it.  I'm not necessarily arguing that fencing is wrong
here, but I think it's healthy to occasionally go back to first

Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Thu, Nov 30, 2017 at 1:39 PM, Gao,Yan  wrote:
> On 11/30/2017 09:14 AM, Andrei Borzenkov wrote:
>>
>> On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot  wrote:
>>>
>>>
>>> The same scenario is why a single node can't have quorum at start-up in
>>> a cluster with "two_node" set. Both nodes have to see each other at
>>> least once before they can assume it's safe to do anything.
>>>
>>
>> Unless we set no-quorum-policy=ignore in which case it will proceed
>> after fencing another node. As far as I'm understand this is the only
>> way to get number of active cluster nodes below quorum, right?
>
> To be safe, "two_node: 1" automatically enables "wait_for_all". Of course
> one can explicitly disable "wait_for_all" if they know what they are doing.
>

Well ...

ha1:~ # crm corosync status
Printing ring status.
Local node ID 1084766299
RING ID 0
id  = 192.168.56.91
status  = ring 0 active with no faults
Quorum information
--
Date: Thu Nov 30 19:09:57 2017
Quorum provider:  corosync_votequorum
Nodes:1
Node ID:  1084766299
Ring ID:  412
Quorate:  No

Votequorum information
--
Expected votes:   2
Highest expected: 2
Total votes:  1
Quorum:   1 Activity blocked
Flags:2Node WaitForAll

Membership information
--
Nodeid  Votes Name
1084766299  1 ha1 (local)
ha1:~ #

ha1:~ # crm_mon -1r
Stack: corosync
Current DC: ha1 (version 1.1.16-4.8-77ea74d) - partition WITHOUT quorum
Last updated: Thu Nov 30 19:08:03 2017
Last change: Thu Nov 30 11:05:03 2017 by root via cibadmin on ha1

2 nodes configured
3 resources configured

Online: [ ha1 ]
OFFLINE: [ ha2 ]

Full list of resources:

 stonith-sbd(stonith:external/sbd): Started ha1
 Master/Slave Set: ms_Stateful_1 [rsc_Stateful_1]
 Masters: [ ha1 ]
 Stopped: [ ha2 ]
ha1:~ #

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Gao,Yan

On 11/30/2017 09:14 AM, Andrei Borzenkov wrote:

On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot  wrote:


The same scenario is why a single node can't have quorum at start-up in
a cluster with "two_node" set. Both nodes have to see each other at
least once before they can assume it's safe to do anything.



Unless we set no-quorum-policy=ignore in which case it will proceed
after fencing another node. As far as I'm understand this is the only
way to get number of active cluster nodes below quorum, right?
To be safe, "two_node: 1" automatically enables "wait_for_all". Of 
course one can explicitly disable "wait_for_all" if they know what they 
are doing.


Regards,
  Yan




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-30 Thread Andrei Borzenkov
On Wed, Nov 29, 2017 at 6:54 PM, Ken Gaillot  wrote:
>
> The same scenario is why a single node can't have quorum at start-up in
> a cluster with "two_node" set. Both nodes have to see each other at
> least once before they can assume it's safe to do anything.
>

Unless we set no-quorum-policy=ignore in which case it will proceed
after fencing another node. As far as I'm understand this is the only
way to get number of active cluster nodes below quorum, right?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 09:09 PM, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
>
>> OK, so reading between the lines, if we don't want our cluster's
>> latest config changes accidentally discarded during a complete cluster
>> reboot, we should ensure that the last man standing is also the first
>> one booted up - right?
> That would make sense to me, but I don't know if it's the only
> solution. If you separately ensure that they all have the same
> configuration first, you could start them in any order I guess.

I guess it is not that bad as after the last man standing has left
the stage it would take a quorate number (actually depending on
how many you allow to survive) of nodes till anything
happens again (equivalent to wait-for-all in 2-node clusters).
And one of these should have a reasonably current cib.

>
>> If so, I think that's a perfectly reasonable thing to ask for, but
>> maybe it should be documented explicitly somewhere?  Apologies if it
>> is already and I missed it.
> Yeah, maybe a section discussing both starting and stopping a whole
> cluster would be helpful, but I don't know if I feel like I've thought
> about it enough myself. Regarding the HP Service Guard commands that
> Ulrich Windl mentioned, the very idea of such commands offends me on
> some level but I don't know if I can clearly articulate why. :D
>


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund
Adam Spiers  writes:

>
> OK, so reading between the lines, if we don't want our cluster's
> latest config changes accidentally discarded during a complete cluster
> reboot, we should ensure that the last man standing is also the first
> one booted up - right?

That would make sense to me, but I don't know if it's the only
solution. If you separately ensure that they all have the same
configuration first, you could start them in any order I guess.

>
> If so, I think that's a perfectly reasonable thing to ask for, but
> maybe it should be documented explicitly somewhere?  Apologies if it
> is already and I missed it.

Yeah, maybe a section discussing both starting and stopping a whole
cluster would be helpful, but I don't know if I feel like I've thought
about it enough myself. Regarding the HP Service Guard commands that
Ulrich Windl mentioned, the very idea of such commands offends me on
some level but I don't know if I can clearly articulate why. :D

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Kristoffer Gronlund  wrote:

Adam Spiers  writes:

Kristoffer Gronlund  wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)


This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


You have a new cluster with (possibly fragmented) memories of a previous
life ;)


Well yeah, that's another way of describing it :-)


Yes, exactly.  If the first node to start up was not the last man
standing, the CIB history is effectively being forked.  So how is this
issue avoided?


The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.


That's ignoring the cluster config, which persists even when the
cluster's down.


There could be a command in pacemaker which resets a set of nodes to a
common known state, basically to pick the CIB from one of the nodes as
the survivor and copy that to all of them. But in the end, that's just
the same thing as just picking one node as the first node, and telling
the others to join that one and to discard their configurations. So,
treating it as a new cluster.


OK, so reading between the lines, if we don't want our cluster's
latest config changes accidentally discarded during a complete cluster
reboot, we should ensure that the last man standing is also the first
one booted up - right?

If so, I think that's a perfectly reasonable thing to ask for, but
maybe it should be documented explicitly somewhere?  Apologies if it
is already and I missed it.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Klaus Wenninger  wrote:

On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)

This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.

When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster? The next node could
have a newer version of the CIB which adds yet more nodes to the
cluster.


To make it even clearer imagine a node being reverted
to a previous state by recovering it from a backup.


Yes, I'm asking how this kind of scenario is dealt with :-)

Another example is a config change being made after one or more of the
cluster nodes had already been shut down.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Adam Spiers

Kristoffer Gronlund  wrote:

Adam Spiers  writes:


- The whole cluster is shut down cleanly.

- The whole cluster is then started up again.  (Side question: what
  happens if the last node to shut down is not the first to start up?
  How will the cluster ensure it has the most recent version of the
  CIB?  Without that, how would it know whether the last man standing
  was shut down cleanly or not?)


This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.


Sorry, I don't follow you at all here.  When you start the cluster up
again, the cluster config from before the shutdown is still there.
That's very far from being a completely new cluster :-)


When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster?


That was exactly my question.


The next node could have a newer version of the CIB which adds yet
more nodes to the cluster.


Yes, exactly.  If the first node to start up was not the last man
standing, the CIB history is effectively being forked.  So how is this
issue avoided?


The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.


That's ignoring the cluster config, which persists even when the
cluster's down.

But to be clear, you picked a small side question from my original
post and answered that.  The main questions I had were about startup
fencing :-)

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Gao,Yan

On 11/29/2017 04:54 PM, Ken Gaillot wrote:

On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:

The same questions apply if this troublesome node was actually a
remote node running pacemaker_remoted, rather than the 5th node in
the
cluster.


Remote nodes don't join at the crmd level as cluster nodes do, so they
don't "start up" in the same sense, and start-up fencing doesn't apply
to them. Instead, the cluster initiates the connection when called for
(I don't remember for sure whether it fences the remote node if the
connection fails, but that would make sense).
According to link_rsc2remotenode() and handle_startup_fencing(), similar 
"startup-fencing applies to remote nodes too. So if a remote resource 
fails to start, the remote node will be fenced. A global setting 
statup-fencing=false will change the behavior for remote nodes too.


Regards,
  Yan

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Ken Gaillot
On Wed, 2017-11-29 at 14:22 +, Adam Spiers wrote:
> Hi all,
> 
> A colleague has been valiantly trying to help me belatedly learn
> about
> the intricacies of startup fencing, but I'm still not fully
> understanding some of the finer points of the behaviour.
> 
> The documentation on the "startup-fencing" option[0] says
> 
> Advanced Use Only: Should the cluster shoot unseen nodes? Not
> using the default is very unsafe!
> 
> and that it defaults to TRUE, but doesn't elaborate any further:
> 
> https://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacema
> ker_Explained/s-cluster-options.html
> 
> Let's imagine the following scenario:
> 
> - We have a 5-node cluster, with all nodes running cleanly.
> 
> - The whole cluster is shut down cleanly.
> 
> - The whole cluster is then started up again.  (Side question: what
>   happens if the last node to shut down is not the first to start up?
>   How will the cluster ensure it has the most recent version of the
>   CIB?  Without that, how would it know whether the last man standing
>   was shut down cleanly or not?)

Of course, the cluster can't know what CIB version nodes it doesn't see
have, so if a set of nodes is started with an older version, it will go
with that.

However, a node can't do much without quorum, so it would be difficult
to get in a situation where CIB changes were made with quorum before
shutdown, but none of those nodes are present at the next start-up with
quorum.

In any case, when a new node joins a cluster, the nodes do compare CIB
versions. If the new node has a newer CIB, the cluster will use it. If
other changes have been made since then, the newest CIB wins, so one or
the other's changes will be lost.

Whether missing nodes were shut down cleanly or not relates to your
next question ...

> - 4 of the nodes boot up fine and rejoin the cluster within the
>   dc-deadtime interval, foruming a quorum, but the 5th doesn't.
> 
> IIUC, with startup-fencing enabled, this will result in that 5th node
> automatically being fenced.  If I'm right, is that really *always*
> necessary?

It's always safe. :-) As you mentioned, if the missing node was the
last one alive in the previous run, the cluster can't know whether it
shut down cleanly or not. Even if the node was known to shut down
cleanly in the last run, the cluster still can't know whether the node
was started since then and is now merely unreachable. So, fencing is
necessary to ensure it's not accessing resources.

The same scenario is why a single node can't have quorum at start-up in
a cluster with "two_node" set. Both nodes have to see each other at
least once before they can assume it's safe to do anything.

> Let's suppose further that the cluster configuration is such that no
> stateful resources which could potentially conflict with other nodes
> will ever get launched on that 5th node.  For example it might only
> host stateless clones, or resources with require=nothing set, or it
> might not even host any resources at all due to some temporary
> constraints which have been applied.
> 
> In those cases, what is to be gained from fencing?  The only thing I
> can think of is that using (say) IPMI to power-cycle the node *might*
> fix whatever issue was preventing it from joining the cluster.  Are
> there any other reasons for fencing in this case?  It wouldn't help
> avoid any data corruption, at least.

Just because constraints are telling the node it can't run a resource
doesn't mean the node isn't malfunctioning and running it anyway. If
the node can't tell us it's OK, we have to assume it's not.

> Now let's imagine the same scenario, except rather than a clean full
> cluster shutdown, all nodes were affected by a power cut, but also
> this time the whole cluster is configured to *only* run stateless
> clones, so there is no risk of conflict between two nodes
> accidentally
> running the same resource.  On startup, the 4 nodes in the quorum
> have
> no way of knowing that the 5th node was also affected by the power
> cut, so in theory from their perspective it could still be running a
> stateless clone.  Again, is there anything to be gained from fencing
> the 5th node once it exceeds the dc-deadtime threshold for joining,
> other than the chance that a reboot might fix whatever was preventing
> it from joining, and get the cluster back to full strength?

If a cluster runs only services that have no potential to conflict,
then you don't need a cluster. :-)

Unique clones require communication even if they're stateless (think
IPaddr2). I'm pretty sure even some anonymous stateless clones require
communication to avoid issues.

> Also, when exactly does the dc-deadtime timer start ticking?
> Is it reset to zero after a node is fenced, so that potentially that
> node could go into a reboot loop if dc-deadtime is set too low?

A node's crmd starts the timer at start-up and whenever a new election
starts, and is stopped when the DC makes it a join offer. I don't 

Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Klaus Wenninger
On 11/29/2017 04:23 PM, Kristoffer Grönlund wrote:
> Adam Spiers  writes:
>
>> - The whole cluster is shut down cleanly.
>>
>> - The whole cluster is then started up again.  (Side question: what
>>   happens if the last node to shut down is not the first to start up?
>>   How will the cluster ensure it has the most recent version of the
>>   CIB?  Without that, how would it know whether the last man standing
>>   was shut down cleanly or not?)
> This is my opinion, I don't really know what the "official" pacemaker
> stance is: There is no such thing as shutting down a cluster cleanly. A
> cluster is a process stretching over multiple nodes - if they all shut
> down, the process is gone. When you start up again, you effectively have
> a completely new cluster.
>
> When starting up, how is the cluster, at any point, to know if the
> cluster it has knowledge of is the "latest" cluster? The next node could
> have a newer version of the CIB which adds yet more nodes to the
> cluster.

To make it even clearer imagine a node being reverted
to a previous state by recovering it from a backup.

Regards,
Klaus

>
> The only way to bring up a cluster from being completely stopped is to
> treat it as creating a completely new cluster. The first node to start
> "creates" the cluster and later nodes join that cluster.
>
> Cheers,
> Kristoffer
>


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] questions about startup fencing

2017-11-29 Thread Kristoffer Grönlund
Adam Spiers  writes:

> - The whole cluster is shut down cleanly.
>
> - The whole cluster is then started up again.  (Side question: what
>   happens if the last node to shut down is not the first to start up?
>   How will the cluster ensure it has the most recent version of the
>   CIB?  Without that, how would it know whether the last man standing
>   was shut down cleanly or not?)

This is my opinion, I don't really know what the "official" pacemaker
stance is: There is no such thing as shutting down a cluster cleanly. A
cluster is a process stretching over multiple nodes - if they all shut
down, the process is gone. When you start up again, you effectively have
a completely new cluster.

When starting up, how is the cluster, at any point, to know if the
cluster it has knowledge of is the "latest" cluster? The next node could
have a newer version of the CIB which adds yet more nodes to the
cluster.

The only way to bring up a cluster from being completely stopped is to
treat it as creating a completely new cluster. The first node to start
"creates" the cluster and later nodes join that cluster.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org