Re: [ClusterLabs] Antw: [EXT] Re: Parsing the output of crm_mon
Also xmllint has '--xpath' (unless you are running something as old as RHEL6) and is available on every linux distro. Best Regards,Strahil Nikolov On Mon, Mar 21, 2022 at 15:41, Ken Gaillot wrote: On Mon, 2022-03-21 at 08:27 +0100, Ulrich Windl wrote: > > > > Ken Gaillot schrieb am 18.03.2022 um > > > > 13:39 in > Nachricht > : > > On Fri, 2022‑03‑18 at 08:46 +0100, Ulrich Windl wrote: > > > Hi! > > > > > > Parsing the output of crm_mon I wonder: > > > Is there a collection of sample outputs for pacemaker 1 and 2 > > > formats > > > showing all types of resources? > > > > Ideally, any parsing should be done of the XML output generated by > > ‑‑ > > output‑as=xml since 2.0.3 and ‑‑as‑xml before then (the output is > > identical other than the outermost tag). > > Agreed, but it's much trickier to parse XML with awk ;-) > Maybe it' even less efficient (unless crm_mon itself is much more > efficient > when out putting XML) > With XPath support, I might be able to create the output I need using > xrm_mon > only, but that's not implemented. > > Regards, > Ulrich xmlstarlet can search xpaths, e.g. crm_mon -1 --output-as=xml | xmlstarlet sel -t -v "//element/@attribute" > > > > The XML output is stable and only gets backward‑compatible > > additions > > once in a long while, but the text output changes more frequently > > and > > significantly. > > > > There's an RNG schema for it, api‑result.rng (where it's installed > > depends on your build; in the source repository, make generates it > > under xml/api). > > > > > Also I realized that the coutput for clone sets in unfortunate: > > > Consider a normal primitive like this: > > > * primitive_name (ocf::heartbeat:agent_name): Started > > > host‑name > > > And a clone set: > > > * Clone Set: clone_name [primitive_name]: > > > > > > If you want to filter clone sets by resource agent you're lost > > > there. > > > It would have been nicht if the format of clone sets were: > > > * Clone Set: clone_name [primitive_name] > > > (ocf::heartbeat:agent_name): > > > > > > I see that there's the "‑R" option that "expands" the clones > > > similar > > > as resource groups like this: > > > * primitive_name (ocf::heartbeat:agent): Started > > > host‑name > > > > > > Regards, > > > Ulrich > > > > > > > > > > > > > > > > > > > > > ___ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > ‑‑ > > Ken Gaillot > > > > ___ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Goodbye crm_report?
Hi all, Pacemaker's crm_report tool collects cluster information for debugging, especially helpful for upstream or commercial support investigations. The external sosreport tool has plugins for ClusterLabs software, providing a more familiar and generic interface for the same functionality. sosreport is readily available in many distros including Fedora, RHEL, SUSE, Debian, Ubuntu, and similar. We are thinking of merging what is still useful from the old and creaky crm_report code into the sosreport plugins, and deprecating crm_report. Does anyone have any strong emotional attachment to keeping crm_report around? :-) It would remain available for a long transition period to give time for the updated sosreport plugins to make their way into distros and for higher-level tools and user scripts to be updated. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs 0.10.12 released
I am happy to announce the latest release of pcs, version 0.10.12. Source code is available at: https://github.com/ClusterLabs/pcs/archive/v0.10.12.tar.gz or https://github.com/ClusterLabs/pcs/archive/v0.10.12.zip The main feature added in this release is support for OCF 1.1 agents. Complete change log for this release: ## [0.10.12] - 2021-11-30 ### Added - Option `--autodelete` of command `pcs resource move` is fully supported ([rhbz#1990784]) - Support for OCF 1.1 resource and stonith agents ([rhbz#2018969]) ### Fixed - Do not show warning that no stonith device was detected and stonith-enabled is not false when a stonith device is in a group ([ghpull#370]) - Misleading error message from `pcs quorum unblock` when `wait_for_all=0` ([rhbz#1968088]) - Misleading error message from `pcs booth setup` and `pcs booth pull` when booth config directory (`/etc/booth`) is missing ([rhbz#1791670], [ghpull#411], [ghissue#225]) Thanks / congratulations to everyone who contributed to this release, including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Miroslav Lisik, Ondrej Mular, Tomas Jelinek and vivi. Cheers, Tomas [ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225 [ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370 [ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411 [rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670 [rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088 [rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784 [rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs 0.11.2 released
I am happy to announce the latest release of pcs, version 0.11.2. Source code is available at: https://github.com/ClusterLabs/pcs/archive/v0.11.2.tar.gz or https://github.com/ClusterLabs/pcs/archive/v0.11.2.zip Complete change log for this release: ## [0.11.2] - 2022-02-01 ### Fixed - Pcs was not automatically enabling corosync-qdevice when adding a quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902]) - `resource update` command exiting with a traceback when updating a resource with a non-existing resource agent ([rhbz#2019836]) - pcs\_snmp\_agent is working again (broken since pcs-0.10.1) ([ghpull#431]) - Skip checking of scsi devices to be removed before unfencing to be added devices ([rhbz#2033248]) - Make `ocf:linbit:drbd` agent pass OCF standard validation ([ghissue#441], [rhbz#2036633]) - Multiple improvements of `pcs resource move` command ([rhbz#1996062]) - Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463]) ### Deprecated - Usage of `pcs resource` commands for stonith resources and vice versa ([rhbz#1301204]) Thanks / congratulations to everyone who contributed to this release, including Fabio M. Di Nitto, Miroslav Lisik, Ondrej Mular, Tomas Jelinek and Valentin Vidić. Cheers, Tomas [ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441 [ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431 [rhbz#1301204]: https://bugzilla.redhat.com/show_bug.cgi?id=1301204 [rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062 [rhbz#2019836]: https://bugzilla.redhat.com/show_bug.cgi?id=2019836 [rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463 [rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902 [rhbz#2033248]: https://bugzilla.redhat.com/show_bug.cgi?id=2033248 [rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs 0.10.13 released
I am happy to announce the latest release of pcs, version 0.10.13. Source code is available at: https://github.com/ClusterLabs/pcs/archive/v0.10.13.tar.gz or https://github.com/ClusterLabs/pcs/archive/v0.10.13.zip Complete change log for this release: ## [0.10.13] - 2022-01-31 ### Fixed - Pcs was not automatically enabling corosync-qdevice when adding a quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902]) - `resource update` command exiting with a traceback when updating a resource with a non-existing resource agent ([rhbz#1384485]) - pcs\_snmp\_agent is working again (broken since pcs-0.10.1) ([ghpull#431]) - Skip checking of scsi devices to be removed before unfencing to be added devices ([rhbz#2032997]) - Make `ocf:linbit:drbd` agent pass OCF standard validation ([ghissue#441], [rhbz#2036633]) - Multiple improvements of `pcs resource move --autodelete` command ([rhbz#1990784]) - Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463]) Thanks / congratulations to everyone who contributed to this release, including Miroslav Lisik, Ondrej Mular, Tomas Jelinek and Valentin Vidić. Cheers, Tomas [ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441 [ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431 [rhbz#1384485]: https://bugzilla.redhat.com/show_bug.cgi?id=1384485 [rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784 [rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463 [rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902 [rhbz#2032997]: https://bugzilla.redhat.com/show_bug.cgi?id=2032997 [rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] pcs 0.11.1 released
I am happy to announce the latest release of pcs, version 0.11.1. Source code is available at: https://github.com/ClusterLabs/pcs/archive/v0.11.1.tar.gz or https://github.com/ClusterLabs/pcs/archive/v0.11.1.zip This is the first release of the pcs-0.11 branch. The branch fully supports clusters running Corosync 3.x and Pacemaker 2.1.x. In the meantime, we are still providing pcs-0.10 branch, supporting Pacemaker 2.0.x and 2.1.x compiled with '--enable-compat-2.0' option. The most important changes for this release: * Support for OCF 1.1 agents * Manually moving resources without leaving location constraints behind * Errors, warnings and progress related output are now printed to stderr instead of stdout * Role names "Promoted" and "Unpromoted" are preferred. Legacy names "Master" and "Slave" still work, but they are deprecated. Complete change log for this release: ## [0.11.1] - 2021-11-30 ### Removed - Deprecated obsolete commands `pcs config import-cman` and `pcs config export pcs-commands|pcs-commands-verbose` have been removed ([rhbz#1881064]) - Unused and unmaintained pcsd urls: `/remote/config_backup`, `/remote/node_available`, `/remote/resource_status` - Pcsd no longer provides data in format used by web UI in pcs 0.9.142 and older ### Added - Explicit confirmation is now required to prevent accidental destroying of the cluster with `pcs cluster destroy` ([rhbz#1283805]) - Add add/remove cli syntax for command `pcs stonith update-scsi-devices` ([rhbz#1992668]) - Command `pcs resource move` is fully supported ([rhbz#1990787]) - Support for OCF 1.1 resource and stonith agents ([rhbz#2018969]) ### Changed - Pcs no longer depends on python3-distro package - 'pcs status xml' now prints cluster status in the new format provided by Pacemaker 2.1 ([rhbz#1985981]) - All errors, warning and progress related output is now printed to stderr instead of stdout - Make roles `Promoted` and `Unpromoted` default ([rhbz#1885293]) - Make auto-deleting constraint default for `pcs resource move` command ([rhbz#1996062]) - Deprecation warnings use a "Deprecation Warning:" prefix instead of "Warning:" on the command line - Minimal required version of python has been changed to 3.9 - Minimal required version of ruby has been changed to 2.5 - Minimal supported version of pacemaker is 2.1 ### Fixed - Do not unfence newly added devices on fenced cluster nodes ([rhbz#1991654]) - Fix displaying fencing levels with regular expression targets ([rhbz#1533090]) - Reject cloning of stonith resources ([rhbz#1811072]) - Do not show warning that no stonith device was detected and stonith-enabled is not false when a stonith device is in a group ([ghpull#370]) - Misleading error message from `pcs quorum unblock` when `wait_for_all=0` ([rhbz#1968088]) - Misleading error message from `pcs booth setup` and `pcs booth pull` when booth config directory (`/etc/booth`) is missing ([rhbz#1791670], [ghpull#411], [ghissue#225]) ### Deprecated - Legacy role names `Master` and `Slave` ([rhbz#1885293]) - Option `--master` is deprecated and has been replaced by option `--promoted` ([rhbz#1885293]) Thanks / congratulations to everyone who contributed to this release, including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Michele Baldessari, Miroslav Lisik, Ondrej Mular, Tomas Jelinek and vivi. Cheers, Tomas [ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225 [ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370 [ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411 [rhbz#1283805]: https://bugzilla.redhat.com/show_bug.cgi?id=1283805 [rhbz#1533090]: https://bugzilla.redhat.com/show_bug.cgi?id=1533090 [rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670 [rhbz#1811072]: https://bugzilla.redhat.com/show_bug.cgi?id=1811072 [rhbz#1881064]: https://bugzilla.redhat.com/show_bug.cgi?id=1881064 [rhbz#1885293]: https://bugzilla.redhat.com/show_bug.cgi?id=1885293 [rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088 [rhbz#1985981]: https://bugzilla.redhat.com/show_bug.cgi?id=1985981 [rhbz#1990787]: https://bugzilla.redhat.com/show_bug.cgi?id=1990787 [rhbz#1991654]: https://bugzilla.redhat.com/show_bug.cgi?id=1991654 [rhbz#1992668]: https://bugzilla.redhat.com/show_bug.cgi?id=1992668 [rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062 [rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)
On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote: > Hi All, > > We have a scenario on SLES 12 SP3 cluster. > The scenario is explained as follows in the order of events: > There is a 2-node cluster (FILE-1, FILE-2) > The cluster and the resources were up and running fine initially . > Then fencing request from pacemaker got issued on both nodes > simultaneously > > Logs from 1st node: > 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] > Failed to receive the leave message. failed: 2 > . > . > 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: > notice: Requesting that FILE-1 perform 'off' action targeting FILE-2 > > Logs from 2nd node: > 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] > Failed to receive the leave message. failed: 1 > . > . > Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith) > notice: Requesting that FILE-2 perform 'off' action targeting FILE-1 > > When the nodes came up after unfencing, the DC got set after > election > After that the resources which were expected to run on only one node > became active on both (all) nodes of the cluster. > > 27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource stonith-sbd is active on 2 nodes > (attempting recovery) > 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource FILE_Filesystem is active on 2 > nodes (attem pting recovery) > 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource IP_Floating is active on 2 nodes > (attemptin g recovery) > 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource Service_Postgresql is active on 2 > nodes (at tempting recovery) > 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource Service_Postgrest is active on 2 > nodes (att empting recovery) > 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource Service_esm_primary is active on 2 > nodes (a ttempting recovery) > 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker- > schedulerd[5018]: notice: See > https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for > more information > 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker- > schedulerd[5018]: error: Resource Shared_Cluster_Backup is active on > 2 nodes (attempting recovery) > > Can you guys please help us understand if this is indeed a split- > brain scenario ? Under what circumstances can such a scenario be > observed? This does look like a split-brain, and the most likely cause is that the fence agent reported that fencing was successful, but it actually wasn't. What are you using as a fencing device? If you're using watchdog-based SBD, that won't work with only two nodes, because both nodes will assume they still have quorum, and not self-fence. You need either true quorum or a shared external drive to use SBD. > We can have very serious impact if such a case can re-occur inspite > of stonith already configured. Hence the ask . > In case this situation gets reproduced, how can it be handled? > > Note: We have stonith configured and it has been working fine so far. > In this case also, the initial fencing happened from stonith only. > > Thanks in advance! -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/