Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-29 Thread Klaus Wenninger
On Thu, Mar 24, 2022 at 4:12 PM Ken Gaillot  wrote:
>
> On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote:
> > Hi All,
> >
> > We have a scenario on SLES 12 SP3 cluster.
> > The scenario is explained as follows in the order of events:
> >  There is a 2-node cluster (FILE-1, FILE-2)
> >  The cluster and the resources were up and running fine initially .
> >  Then fencing request from pacemaker got issued on both nodes
> > simultaneously
> >
> > Logs from 1st node:
> > 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ]
> > Failed to receive the leave message. failed: 2
> > .
> > .
> > 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]:
> > notice: Requesting that FILE-1 perform 'off' action targeting FILE-2
> >
> > Logs from 2nd node:
> > 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ]
> > Failed to receive the leave message. failed: 1
> > .
> > .
> > Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
> > notice: Requesting that FILE-2 perform 'off' action targeting FILE-1
> >
> >  When the nodes came up after unfencing, the DC got set after
> > election
> >  After that the resources which were expected to run on only one node
> > became active on both (all) nodes of the cluster.
> >
> >  27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource stonith-sbd is active on 2 nodes
> > (attempting recovery)
> > 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource FILE_Filesystem is active on 2
> > nodes (attem pting recovery)
> > 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource IP_Floating is active on 2 nodes
> > (attemptin g recovery)
> > 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_Postgresql is active on 2
> > nodes (at tempting recovery)
> > 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_Postgrest is active on 2
> > nodes (att empting recovery)
> > 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Service_esm_primary is active on 2
> > nodes (a ttempting recovery)
> > 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: notice: See
> > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> > more information
> > 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-
> > schedulerd[5018]: error: Resource Shared_Cluster_Backup is active on
> > 2 nodes (attempting recovery)
> >
> > Can you guys please help us understand if this is indeed a split-
> > brain scenario ? Under what circumstances can such a scenario be
> > observed?
>
> This does look like a split-brain, and the most likely cause is that
> the fence agent reported that fencing was successful, but it actually
> wasn't.
>
> What are you using as a fencing device?
>
> If you're using watchdog-based SBD, that won't work with only two
> nodes, because both nodes will assume they still have quorum, and not
> self-fence. You need either true quorum or a shared external drive to
> use SBD.

We see a fencing-resource stonith_sbd so I would guess
poison-pill-fencing is configured.
So we should verify there isn't stonith-watchdog-timeout configured
to anything but 0 as well - just to be sure it would never fall back
to watchdog-fencing.
Maybe you can try inserting the poison pill manually and see if
the targeted node is rebooting. You can either do that using high-level
tooling as crmsh or pcs or using the sbd-binary as cmdline-tool
directly.
You can try that both from the node to rebooted as well as from the
other node. To e.g. check if both sides see the same disk(s) ...
Check that the disk(s) configured with the sbd-service are the
same as those configured for the sbd-fencing-resource (and of
course when using sbd as cmdline tool to insert a poison pill
the same disks have to be used as well).
Is sbd-service running without complaints?
Please check as 

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-24 Thread Ken Gaillot
On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote:
> Hi All,
>  
> We have a scenario on SLES 12 SP3 cluster.
> The scenario is explained as follows in the order of events:
>  There is a 2-node cluster (FILE-1, FILE-2)
>  The cluster and the resources were up and running fine initially .
>  Then fencing request from pacemaker got issued on both nodes
> simultaneously
>  
> Logs from 1st node:  
> 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ]
> Failed to receive the leave message. failed: 2
> .
> .
> 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]:
> notice: Requesting that FILE-1 perform 'off' action targeting FILE-2
>  
> Logs from 2nd node:  
> 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ]
> Failed to receive the leave message. failed: 1
> .
> .
> Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
> notice: Requesting that FILE-2 perform 'off' action targeting FILE-1
>  
>  When the nodes came up after unfencing, the DC got set after
> election
>  After that the resources which were expected to run on only one node
> became active on both (all) nodes of the cluster.
>  
>  27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource stonith-sbd is active on 2 nodes
> (attempting recovery)
> 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource FILE_Filesystem is active on 2
> nodes (attem pting recovery)
> 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource IP_Floating is active on 2 nodes
> (attemptin g recovery)
> 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_Postgresql is active on 2
> nodes (at tempting recovery)
> 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_Postgrest is active on 2
> nodes (att empting recovery)
> 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_esm_primary is active on 2
> nodes (a ttempting recovery)
> 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Shared_Cluster_Backup is active on
> 2 nodes (attempting recovery)  
>  
> Can you guys please help us understand if this is indeed a split-
> brain scenario ? Under what circumstances can such a scenario be
> observed?

This does look like a split-brain, and the most likely cause is that
the fence agent reported that fencing was successful, but it actually
wasn't.

What are you using as a fencing device?

If you're using watchdog-based SBD, that won't work with only two
nodes, because both nodes will assume they still have quorum, and not
self-fence. You need either true quorum or a shared external drive to
use SBD.

> We can have very serious impact if such a case can re-occur inspite
> of stonith already configured. Hence the ask .
> In case this situation gets reproduced, how can it be handled? 
> 
> Note: We have stonith configured and it has been working fine so far.
> In this case also, the initial fencing happened from stonith only.
>  
> Thanks in advance!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Andrei Borzenkov
On 23.03.2022 08:30, Balotra, Priyanka wrote:
> Hi All,
> 
> We have a scenario on SLES 12 SP3 cluster.
> The scenario is explained as follows in the order of events:
> 
>   *   There is a 2-node cluster (FILE-1, FILE-2)
>   *   The cluster and the resources were up and running fine initially .
>   *   Then fencing request from pacemaker got issued on both nodes 
> simultaneously
> 
> Logs from 1st node:
> 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] Failed to 
> receive the leave message. failed: 2
> .
> .
> 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: notice: 
> Requesting that FILE-1 perform 'off' action targeting FILE-2
> 
> Logs from 2nd node:
> 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] Failed to 
> receive the leave message. failed: 1
> .
> .
> Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith) notice: 
> Requesting that FILE-2 perform 'off' action targeting FILE-1
> 

This is normal behavior in case of split brain. Each node will try to
fence another node to be able to take over resources from it.

> 
>   *   When the nodes came up after unfencing, the DC got set after election

What exactly "came up" means?

>   *   After that the resources which were expected to run on only one node 
> became active on both (all) nodes of the cluster.
> 

It sounds like both nodes believed fencing has been successful and so
each node took over resources from another node. It is impossible to
tell more without seeing actual logs from both nodes and actual
configuration.

> 27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource stonith-sbd is active on 2 nodes (attempting recovery)
> 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
> 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource IP_Floating is active on 2 nodes (attemptin g recovery)
> 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
> 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_Postgrest is active on 2 nodes (att empting recovery)
> 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_esm_primary is active on 2 nodes (a ttempting 
> recovery)
> 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Shared_Cluster_Backup is active on 2 nodes (attempting 
> recovery)
> 
> 
> Can you guys please help us understand if this is indeed a split-brain 
> scenario ? 

I do not understand this question and I suspect you are using "split
brain" incorrectly. Split brain is condition when corosync/pacemaker on
two nodes cannot communicate. Split brain ends with fencing request.

> Under what circumstances can such a scenario be observed?

When two nodes are unable to communicate with each other if "such
scenario" refers to "split brain".

> We can have very serious impact if such a case can re-occur inspite of 
> stonith already configured. Hence the ask .
> In case this situation gets reproduced, how can it be handled?
> 

Stonith agent must never return success unless it can confirm that
fencing was successful.

> Note: We have stonith configured and it has been working fine so far. In this 
> case also, the initial fencing happened from stonith only.
> 
> Thanks in advance!
> 
> 
> 
> 
> 
> Internal Use - Confidential
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:

[ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Priyanka Balotra
Hi All,



We have a scenario on SLES 12 SP3 cluster.

The scenario is explained as follows in the order of events:

-   There is a 2-node cluster (FILE-1, FILE-2)

-   The cluster and the resources were up and running fine initially .

-   Then fencing request from pacemaker got issued on both nodes
simultaneously



Logs from 1st node:

2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] Failed to
receive the leave message. failed: 2

.

.

2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: notice:
Requesting that FILE-1 perform 'off' action targeting FILE-2



Logs from 2nd node:

2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] Failed to
receive the leave message. failed: 1

.

.

Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
notice: Requesting that FILE-2 perform 'off' action targeting FILE-1



-   When the nodes came up after unfencing, the DC got set after
election

-   After that the resources which were expected to run on only one
node became active on both (all) nodes of the cluster.





27290 2022-02-22T04:16:31.699186+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource stonith-sbd is active on 2 nodes (attempting recovery)
27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27292 2022-02-22T04:16:31.699590+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27294 2022-02-22T04:16:31.699878+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource IP_Floating is active on 2 nodes (attemptin g recovery)
27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27296 2022-02-22T04:16:31.700203+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-schedulerd[5018]:
error: Resource Service_Postgrest is active on 2 nodes (att empting
recovery)
27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27300 2022-02-22T04:16:31.700792+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Service_esm_primary is active on 2 nodes (a ttempting recovery)
27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-schedulerd[5018]:
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active
for more information
27302 2022-02-22T04:16:31.701086+00:00 FILE-2
pacemaker-schedulerd[5018]: error:
Resource Shared_Cluster_Backup is active on 2 nodes (attempting recovery)





Can you guys please help us understand if this is indeed a split-brain
scenario ? Under what circumstances can such a scenario be observed?

We can have very serious impact if such a case can re-occur inspite of
stonith already configured. Hence the ask .

In case this situation gets reproduced, how can it be handled?

Note: We have stonith configured and it has been working fine so far. In
this case also, the initial fencing happened from stonith only.



Thanks in advance!

Priyanka
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Balotra, Priyanka
Hi All,

We have a scenario on SLES 12 SP3 cluster.
The scenario is explained as follows in the order of events:

  *   There is a 2-node cluster (FILE-1, FILE-2)
  *   The cluster and the resources were up and running fine initially .
  *   Then fencing request from pacemaker got issued on both nodes 
simultaneously

Logs from 1st node:
2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] Failed to 
receive the leave message. failed: 2
.
.
2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: notice: 
Requesting that FILE-1 perform 'off' action targeting FILE-2

Logs from 2nd node:
2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] Failed to 
receive the leave message. failed: 1
.
.
Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith) notice: 
Requesting that FILE-2 perform 'off' action targeting FILE-1


  *   When the nodes came up after unfencing, the DC got set after election
  *   After that the resources which were expected to run on only one node 
became active on both (all) nodes of the cluster.

27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource stonith-sbd is active on 2 nodes (attempting recovery)
27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource IP_Floating is active on 2 nodes (attemptin g recovery)
27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource Service_Postgrest is active on 2 nodes (att empting recovery)
27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource Service_esm_primary is active on 2 nodes (a ttempting recovery)
27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-schedulerd[5018]: 
notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
more information
27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-schedulerd[5018]: 
error: Resource Shared_Cluster_Backup is active on 2 nodes (attempting recovery)


Can you guys please help us understand if this is indeed a split-brain scenario 
? Under what circumstances can such a scenario be observed?
We can have very serious impact if such a case can re-occur inspite of stonith 
already configured. Hence the ask .
In case this situation gets reproduced, how can it be handled?

Note: We have stonith configured and it has been working fine so far. In this 
case also, the initial fencing happened from stonith only.

Thanks in advance!





Internal Use - Confidential
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/