Re: [ClusterLabs] Antw: [EXT] Re: Parsing the output of crm_mon

2022-03-24 Thread Strahil Nikolov via Users
Also xmllint has '--xpath' (unless you are running something as old as RHEL6) 
and is available on every linux distro.
Best Regards,Strahil Nikolov
 
 
  On Mon, Mar 21, 2022 at 15:41, Ken Gaillot wrote:   On 
Mon, 2022-03-21 at 08:27 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 18.03.2022 um
> > > > 13:39 in
> Nachricht
> :
> > On Fri, 2022‑03‑18 at 08:46 +0100, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > Parsing the output of crm_mon I wonder:
> > > Is there a collection of sample outputs for pacemaker 1 and 2
> > > formats
> > > showing all types of resources?
> > 
> > Ideally, any parsing should be done of the XML output generated by
> > ‑‑
> > output‑as=xml since 2.0.3 and ‑‑as‑xml before then (the output is
> > identical other than the outermost tag).
> 
> Agreed, but it's much trickier to parse XML with awk ;-)
> Maybe it' even less efficient (unless crm_mon itself is much more
> efficient
> when out putting XML)
> With XPath support, I might be able to create the output I need using
> xrm_mon
> only, but that's not implemented.
> 
> Regards,
> Ulrich

xmlstarlet can search xpaths, e.g.

crm_mon -1 --output-as=xml | xmlstarlet sel -t -v "//element/@attribute"

> 
> 
> > The XML output is stable and only gets backward‑compatible
> > additions
> > once in a long while, but the text output changes more frequently
> > and
> > significantly.
> > 
> > There's an RNG schema for it, api‑result.rng (where it's installed
> > depends on your build; in the source repository, make generates it
> > under xml/api).
> > 
> > > Also I realized that the coutput for clone sets in unfortunate:
> > > Consider a normal primitive like this:
> > >  * primitive_name    (ocf::heartbeat:agent_name):  Started
> > > host‑name
> > > And a clone set:
> > >  * Clone Set: clone_name [primitive_name]:
> > > 
> > > If you want to filter clone sets by resource agent you're lost
> > > there.
> > > It would have been nicht if the format of clone sets were:
> > >  * Clone Set: clone_name [primitive_name]
> > > (ocf::heartbeat:agent_name):
> > > 
> > > I see that there's the "‑R" option that "expands" the clones
> > > similar
> > > as resource groups like this:
> > >    * primitive_name    (ocf::heartbeat:agent):    Started
> > > host‑name
> > > 
> > > Regards,
> > > Ulrich
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users 
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/ 
> > > 
> > ‑‑ 
> > Ken Gaillot 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Goodbye crm_report?

2022-03-24 Thread Ken Gaillot
Hi all,

Pacemaker's crm_report tool collects cluster information for debugging,
especially helpful for upstream or commercial support investigations.

The external sosreport tool has plugins for ClusterLabs software,
providing a more familiar and generic interface for the same
functionality. sosreport is readily available in many distros including
Fedora, RHEL, SUSE, Debian, Ubuntu, and similar.

We are thinking of merging what is still useful from the old and creaky
crm_report code into the sosreport plugins, and deprecating crm_report.

Does anyone have any strong emotional attachment to keeping crm_report
around? :-) It would remain available for a long transition period to
give time for the updated sosreport plugins to make their way into
distros and for higher-level tools and user scripts to be updated.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.12 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.12.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.12.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.12.zip

The main feature added in this release is support for OCF 1.1 agents.


Complete change log for this release:
## [0.10.12] - 2021-11-30

### Added
- Option `--autodelete` of command `pcs resource move` is fully
  supported ([rhbz#1990784])
- Support for OCF 1.1 resource and stonith agents ([rhbz#2018969])

### Fixed
- Do not show warning that no stonith device was detected and
  stonith-enabled is not false when a stonith device is in a group
  ([ghpull#370])
- Misleading error message from `pcs quorum unblock` when
  `wait_for_all=0` ([rhbz#1968088])
- Misleading error message from `pcs booth setup` and `pcs booth pull`
  when booth config directory (`/etc/booth`) is missing ([rhbz#1791670],
  [ghpull#411], [ghissue#225])


Thanks / congratulations to everyone who contributed to this release,
including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Miroslav Lisik,
Ondrej Mular, Tomas Jelinek and vivi.

Cheers,
Tomas


[ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225
[ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370
[ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411
[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088
[rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784
[rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.2 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.2.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.2.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.2.zip

Complete change log for this release:
## [0.11.2] - 2022-02-01

### Fixed
- Pcs was not automatically enabling corosync-qdevice when adding a
  quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902])
- `resource update` command exiting with a traceback when updating a
  resource with a non-existing resource agent ([rhbz#2019836])
- pcs\_snmp\_agent is working again (broken since pcs-0.10.1)
  ([ghpull#431])
- Skip checking of scsi devices to be removed before unfencing to be
  added devices ([rhbz#2033248])
- Make `ocf:linbit:drbd` agent pass OCF standard validation
  ([ghissue#441], [rhbz#2036633])
- Multiple improvements of `pcs resource move` command ([rhbz#1996062])
- Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running
  `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463])

### Deprecated
- Usage of `pcs resource` commands for stonith resources and vice versa
  ([rhbz#1301204])


Thanks / congratulations to everyone who contributed to this release,
including Fabio M. Di Nitto, Miroslav Lisik, Ondrej Mular, Tomas Jelinek
and Valentin Vidić.

Cheers,
Tomas


[ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441
[ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431
[rhbz#1301204]: https://bugzilla.redhat.com/show_bug.cgi?id=1301204
[rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062
[rhbz#2019836]: https://bugzilla.redhat.com/show_bug.cgi?id=2019836
[rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463
[rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902
[rhbz#2033248]: https://bugzilla.redhat.com/show_bug.cgi?id=2033248
[rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.13 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.13.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.13.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.13.zip


Complete change log for this release:
## [0.10.13] - 2022-01-31

### Fixed
- Pcs was not automatically enabling corosync-qdevice when adding a
  quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902])
- `resource update` command exiting with a traceback when updating a
  resource with a non-existing resource agent ([rhbz#1384485])
- pcs\_snmp\_agent is working again (broken since pcs-0.10.1)
  ([ghpull#431])
- Skip checking of scsi devices to be removed before unfencing to be
  added devices ([rhbz#2032997])
- Make `ocf:linbit:drbd` agent pass OCF standard validation
  ([ghissue#441], [rhbz#2036633])
- Multiple improvements of `pcs resource move --autodelete` command
  ([rhbz#1990784])
- Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running
  `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463])


Thanks / congratulations to everyone who contributed to this release,
including Miroslav Lisik, Ondrej Mular, Tomas Jelinek and Valentin
Vidić.

Cheers,
Tomas


[ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441
[ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431
[rhbz#1384485]: https://bugzilla.redhat.com/show_bug.cgi?id=1384485
[rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784
[rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463
[rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902
[rhbz#2032997]: https://bugzilla.redhat.com/show_bug.cgi?id=2032997
[rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.1 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.1.zip

This is the first release of the pcs-0.11 branch. The branch fully
supports clusters running Corosync 3.x and Pacemaker 2.1.x. In the
meantime, we are still providing pcs-0.10 branch, supporting Pacemaker
2.0.x and 2.1.x compiled with '--enable-compat-2.0' option.

The most important changes for this release:
* Support for OCF 1.1 agents
* Manually moving resources without leaving location constraints behind
* Errors, warnings and progress related output are now printed to stderr
  instead of stdout
* Role names "Promoted" and "Unpromoted" are preferred. Legacy names
  "Master" and "Slave" still work, but they are deprecated.


Complete change log for this release:
## [0.11.1] - 2021-11-30

### Removed
- Deprecated obsolete commands `pcs config import-cman` and `pcs config
  export pcs-commands|pcs-commands-verbose` have been removed
  ([rhbz#1881064])
- Unused and unmaintained pcsd urls: `/remote/config_backup`,
  `/remote/node_available`, `/remote/resource_status`
- Pcsd no longer provides data in format used by web UI in pcs 0.9.142
  and older

### Added
- Explicit confirmation is now required to prevent accidental destroying
  of the cluster with `pcs cluster destroy` ([rhbz#1283805])
-  Add add/remove cli syntax for command `pcs stonith
   update-scsi-devices` ([rhbz#1992668])
- Command `pcs resource move` is fully supported ([rhbz#1990787])
- Support for OCF 1.1 resource and stonith agents ([rhbz#2018969])

### Changed
- Pcs no longer depends on python3-distro package
- 'pcs status xml' now prints cluster status in the new format provided
  by Pacemaker 2.1 ([rhbz#1985981])
- All errors, warning and progress related output is now printed to
  stderr instead of stdout
- Make roles `Promoted` and `Unpromoted` default ([rhbz#1885293])
- Make auto-deleting constraint default for `pcs resource move` command
  ([rhbz#1996062])
- Deprecation warnings use a "Deprecation Warning:" prefix instead of
  "Warning:" on the command line
- Minimal required version of python has been changed to 3.9
- Minimal required version of ruby has been changed to 2.5
- Minimal supported version of pacemaker is 2.1

### Fixed
- Do not unfence newly added devices on fenced cluster nodes
  ([rhbz#1991654])
- Fix displaying fencing levels with regular expression targets
  ([rhbz#1533090])
- Reject cloning of stonith resources ([rhbz#1811072])
- Do not show warning that no stonith device was detected and
  stonith-enabled is not false when a stonith device is in a group
  ([ghpull#370])
- Misleading error message from `pcs quorum unblock` when
  `wait_for_all=0` ([rhbz#1968088])
- Misleading error message from `pcs booth setup` and `pcs booth pull`
  when booth config directory (`/etc/booth`) is missing ([rhbz#1791670],
  [ghpull#411], [ghissue#225])

### Deprecated
- Legacy role names `Master` and `Slave` ([rhbz#1885293])
- Option `--master` is deprecated and has been replaced by option
  `--promoted` ([rhbz#1885293])


Thanks / congratulations to everyone who contributed to this release,
including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Michele
Baldessari, Miroslav Lisik, Ondrej Mular, Tomas Jelinek and vivi.

Cheers,
Tomas


[ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225
[ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370
[ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411
[rhbz#1283805]: https://bugzilla.redhat.com/show_bug.cgi?id=1283805
[rhbz#1533090]: https://bugzilla.redhat.com/show_bug.cgi?id=1533090
[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#1811072]: https://bugzilla.redhat.com/show_bug.cgi?id=1811072
[rhbz#1881064]: https://bugzilla.redhat.com/show_bug.cgi?id=1881064
[rhbz#1885293]: https://bugzilla.redhat.com/show_bug.cgi?id=1885293
[rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088
[rhbz#1985981]: https://bugzilla.redhat.com/show_bug.cgi?id=1985981
[rhbz#1990787]: https://bugzilla.redhat.com/show_bug.cgi?id=1990787
[rhbz#1991654]: https://bugzilla.redhat.com/show_bug.cgi?id=1991654
[rhbz#1992668]: https://bugzilla.redhat.com/show_bug.cgi?id=1992668
[rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062
[rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-24 Thread Ken Gaillot
On Wed, 2022-03-23 at 05:30 +, Balotra, Priyanka wrote:
> Hi All,
>  
> We have a scenario on SLES 12 SP3 cluster.
> The scenario is explained as follows in the order of events:
>  There is a 2-node cluster (FILE-1, FILE-2)
>  The cluster and the resources were up and running fine initially .
>  Then fencing request from pacemaker got issued on both nodes
> simultaneously
>  
> Logs from 1st node:  
> 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ]
> Failed to receive the leave message. failed: 2
> .
> .
> 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]:
> notice: Requesting that FILE-1 perform 'off' action targeting FILE-2
>  
> Logs from 2nd node:  
> 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ]
> Failed to receive the leave message. failed: 1
> .
> .
> Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith)
> notice: Requesting that FILE-2 perform 'off' action targeting FILE-1
>  
>  When the nodes came up after unfencing, the DC got set after
> election
>  After that the resources which were expected to run on only one node
> became active on both (all) nodes of the cluster.
>  
>  27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource stonith-sbd is active on 2 nodes
> (attempting recovery)
> 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource FILE_Filesystem is active on 2
> nodes (attem pting recovery)
> 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource IP_Floating is active on 2 nodes
> (attemptin g recovery)
> 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_Postgresql is active on 2
> nodes (at tempting recovery)
> 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_Postgrest is active on 2
> nodes (att empting recovery)
> 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Service_esm_primary is active on 2
> nodes (a ttempting recovery)
> 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-
> schedulerd[5018]: notice: See 
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for
> more information
> 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-
> schedulerd[5018]: error: Resource Shared_Cluster_Backup is active on
> 2 nodes (attempting recovery)  
>  
> Can you guys please help us understand if this is indeed a split-
> brain scenario ? Under what circumstances can such a scenario be
> observed?

This does look like a split-brain, and the most likely cause is that
the fence agent reported that fencing was successful, but it actually
wasn't.

What are you using as a fencing device?

If you're using watchdog-based SBD, that won't work with only two
nodes, because both nodes will assume they still have quorum, and not
self-fence. You need either true quorum or a shared external drive to
use SBD.

> We can have very serious impact if such a case can re-occur inspite
> of stonith already configured. Hence the ask .
> In case this situation gets reproduced, how can it be handled? 
> 
> Note: We have stonith configured and it has been working fine so far.
> In this case also, the initial fencing happened from stonith only.
>  
> Thanks in advance!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/