Hi! With these messages it's really hard to say, because you omitted message logged before the split brain had occurred. If a resource was running on FILE-2 and FILE-1 recovered first, it will be DC and it will start resources (even if those were running on FILE-2 before. However normal resources are gone when a node reboots. Maybe your rersources are special. Maybe the monitor is not correct.
We need more details. Regards, Ulrich >>> "Balotra, Priyanka" <priyanka.balo...@dell.com> schrieb am 23.03.2022 um 06:30 in Nachricht <mw4pr19mb549588c581d01842b480b9f5ea...@mw4pr19mb5495.namprd19.prod.outlook.com> > Hi All, > > We have a scenario on SLES 12 SP3 cluster. > The scenario is explained as follows in the order of events: > > * There is a 2‑node cluster (FILE‑1, FILE‑2) > * The cluster and the resources were up and running fine initially . > * Then fencing request from pacemaker got issued on both nodes > simultaneously > > Logs from 1st node: > 2022‑02‑22T03:26:36.737075+00:00 FILE‑1 corosync[12304]: [TOTEM ] Failed to > receive the leave message. failed: 2 > . > . > 2022‑02‑22T03:26:36.977888+00:00 FILE‑1 pacemaker‑fenced[12331]: notice: > Requesting that FILE‑1 perform 'off' action targeting FILE‑2 > > Logs from 2nd node: > 2022‑02‑22T03:26:36.738080+00:00 FILE‑2 corosync[4989]: [TOTEM ] Failed to > receive the leave message. failed: 1 > . > . > Feb 22 03:26:38 FILE‑2 pacemaker‑fenced [5015] (call_remote_stonith) notice: > Requesting that FILE‑2 perform 'off' action targeting FILE‑1 > > > * When the nodes came up after unfencing, the DC got set after election > * After that the resources which were expected to run on only one node > became active on both (all) nodes of the cluster. > > 27290 2022‑02‑22T04:16:31.699186+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource stonith‑sbd is active on 2 nodes (attempting recovery) > 27291 2022‑02‑22T04:16:31.699397+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27292 2022‑02‑22T04:16:31.699590+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource FILE_Filesystem is active on 2 nodes (attem pting recovery) > 27293 2022‑02‑22T04:16:31.699731+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27294 2022‑02‑22T04:16:31.699878+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource IP_Floating is active on 2 nodes (attemptin g recovery) > 27295 2022‑02‑22T04:16:31.700027+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27296 2022‑02‑22T04:16:31.700203+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource Service_Postgresql is active on 2 nodes (at tempting recovery) > 27297 2022‑02‑22T04:16:31.700354+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27298 2022‑02‑22T04:16:31.700501+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource Service_Postgrest is active on 2 nodes (att empting recovery) > 27299 2022‑02‑22T04:16:31.700648+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27300 2022‑02‑22T04:16:31.700792+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource Service_esm_primary is active on 2 nodes (a ttempting recovery) > 27301 2022‑02‑22T04:16:31.700939+00:00 FILE‑2 pacemaker‑schedulerd[5018]: > notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27302 2022‑02‑22T04:16:31.701086+00:00 FILE‑2 pacemaker‑schedulerd[5018]: error: > Resource Shared_Cluster_Backup is active on 2 nodes (attempting recovery) > > > Can you guys please help us understand if this is indeed a split‑brain > scenario ? Under what circumstances can such a scenario be observed? > We can have very serious impact if such a case can re‑occur inspite of > stonith already configured. Hence the ask . > In case this situation gets reproduced, how can it be handled? > > Note: We have stonith configured and it has been working fine so far. In > this case also, the initial fencing happened from stonith only. > > Thanks in advance! > > > > > > Internal Use ‑ Confidential _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/