I suspect this is fixed in newer versions. It's not a join timing issue
but some sort of peer state bug, and there's been a good bit of change
in that area since this code.
A few comments inline ...
On Wed, 2022-09-14 at 12:40 +0200, Lars Ellenberg wrote:
> On Thu, Sep 08, 2022 at 10:11:46AM -050
On Thu, Sep 08, 2022 at 10:11:46AM -0500, Ken Gaillot wrote:
> On Thu, 2022-09-08 at 15:01 +0200, Lars Ellenberg wrote:
> > Scenario:
> > three nodes, no fencing (I know)
> > break network, isolating nodes
> > unbreak network, see how cluster partitions rejoin and resume service
>
> I'm guessing t
On Thu, 2022-09-08 at 15:01 +0200, Lars Ellenberg wrote:
> Scenario:
> three nodes, no fencing (I know)
> break network, isolating nodes
> unbreak network, see how cluster partitions rejoin and resume service
I'm guessing the CIB changed during the break, with more changes in one
of the other part
Scenario:
three nodes, no fencing (I know)
break network, isolating nodes
unbreak network, see how cluster partitions rejoin and resume service
Funny outcome:
/usr/sbin/crm_mon -x pe-input-689.bz2
Cluster Summary:
* Stack: corosync
* Current DC: mqhavm24 (version 1.1.24.linbit-2.0.el7-8f22