Re: [ClusterLabs] resources cluster stoped with one node

2024-03-21 Thread Tomas Jelinek

Dne 20. 03. 24 v 23:56 Ken Gaillot napsal(a):

On Wed, 2024-03-20 at 23:29 +0100, mierdatutis mi wrote:

HI,
I've configured a cluster of two nodes.
When I start one node only I see that the resources won't start.


Hi,

In a two-node cluster, it is not safe to start resources until the
nodes have seen each other once. Otherwise, there's no way to know
whether the other node is unreachable because it is safely down or
because communication has been interrupted (meaning it could still be
running resources).

Corosync's two_node setting automatically takes care of that by also
enabling wait_for_all. If you are certain that the other node is down,
you can disable wait_for_all in the Corosync configuration, start the
node, then re-enable wait_for_all.


If you are using pcs, you can achieve the same thing easily by running 
'pcs quorum unblock' on the online node.


Regards,
Tomas



[root@nodo1 ~]# pcs status --full
Cluster name: mycluster
Stack: corosync
Current DC: nodo1 (1) (version 1.1.23-1.el7-9acf116022) - partition
WITHOUT quorum
Last updated: Wed Mar 20 23:28:45 2024
Last change: Wed Mar 20 19:33:09 2024 by root via cibadmin on nodo1

2 nodes configured
3 resource instances configured

Online: [ nodo1 (1) ]
OFFLINE: [ nodo2 (2) ]

Full list of resources:

  Virtual_IP (ocf::heartbeat:IPaddr2):   Stopped
  Resource Group: HA-LVM
  My_VG  (ocf::heartbeat:LVM-activate):  Stopped
  My_FS  (ocf::heartbeat:Filesystem):Stopped

Node Attributes:
* Node nodo1 (1):

Migration Summary:
* Node nodo1 (1):

Fencing History:

PCSD Status:
   nodo1: Online
   nodo2: Offline

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

Do you know what these behaviors are?
Thanks
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker

2023-10-03 Thread Tomas Jelinek

Dne 03. 10. 23 v 16:24 Jibrail, Qusay (GfK) via Users napsal(a):


Hi Reid,

Thank you for the answer.

So my plan will be:

  * pcs config backup  /root/"Server Name"
  * create a backup of /etc/corosync/
  * create a backup of /etc/postfix
  * pcs cluster stop “server3” àjust to do the failover to server4.
The command pcs cluster stop “server3” will stop corosync and
pacemaker right?


Hi,

Yes, 'pcs cluster stop' command stops both pacemaker and corosync.


 *


  * run pcs status on server3 which should give an error message. And
on server4 should show 1 offline node and one online node
  * Upgrade server3 to OLE8 which will upgrade these 3 package to:
corosync x86_64 3.1.7-1.el8

pacemaker x86_64 2.1.5-9.3.0.1.el8_8

pcs      x86_64 0.10.15-4.0.1.el8_8.1

  * Then run crm_verify to check the configuration. If the
verification is OK then,
  * pcs cluster start “server3”
  * run pcs status on both nodes.

Please see the version of the current installed software.

[root@server3 ~]# corosync -v

Corosync Cluster Engine, version '2.4.5'

Copyright (c) 2006-2009 Red Hat, Inc.

[root@server3 ~]# pacemakerd --version

Pacemaker 1.1.23-1.0.1.el7_9.1

Written by Andrew Beekhof

[root@server3 ~]# pcs --version

0.9.169

Did I missed anything?

Corosync 3 is not compatible with Corosync 2. So once you update server3 
to OLE8, it won't be able to join server4 in the cluster and take over 
cluster resources.


If you are restricted to two nodes, you may remove server3 from the 
cluster, update server3 to OLE8 and create a one node cluster on 
server3. Once you have two one-node clusters, move resources from 
server4 cluster to server3 cluster manually. Then you destroy cluster on 
server4, update server4 to OLE8, and add server4 to the new cluster.


Regards,
Tomas


Kind regards,

*––*

*Qusay Jibrail*

Senior Infrastructure Engineer – Linux | GfK IT Services
GfK – an NIQ company |The Netherlands
Krijgsman 22-25 | Amstelveen | 1186 DM
T: +31 88 435 1232 | M: +31 628 927 686




website 






blog 






instagram 




linkedin 




youtube 




twitter 











*From:*Reid Wahl 
*Sent:* Tuesday, 3 October 2023 09:03
*To:* Cluster Labs - All topics related to open-source clustering 
welcomed 

*Cc:* Jibrail, Qusay (GfK) 
*Subject:* Re: [ClusterLabs] Upgrade to OLE8 + Pacemaker

*WARNING:* This email originated outside of GfK.
  DO NOT CLICK links or attachments unless you recognize the sender 
and know the content is safe.


On Mon, Oct 2, 2023 at 10:51 PM Jibrail, Qusay (GfK) via Users 
 wrote:


Hello,

I am aiming for upgrading one of the cluster node to OLE8 (current
version OLE7) and test if postfix is working fine.

If yes then upgrade the second node to OLE8.

My questions:

Will Pacemaker configuration works after the upgrade?

Hi,

It should. Pacemaker supports rolling upgrades from 1.1.11 (and above) 
to 2.x.x. Other components besides Pacemaker may break, so I'd suggest 
having a backout plan before any upgrade activity.


Do I need to make any changes before or after the upgrade to OLE8?

So server3 will be done first and then server4. Is that the right
order?

Do I need to stop any services before the upgrade?

For Pacemaker, follow the procedure at 
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Administration/singlehtml/#upgrading-a-pacemaker-cluster.


Based on your plan, you should focus particularly on the rolling 
upgrade section.


Refer to your OS vendor's documentation for any other steps you should 
take during an OS upgrade.


*[root@server3 ~]# pacemakerd --version*

Pacemaker 1.1.23-1.0.1.el7_9.1

Written by Andrew Beekhof

*[root@server3 ~]# pcs status*

Cluster name: xx

Stack: corosync

Current DC: *server3* (version 1.1.23-1.0.1.el7_9.1-9acf116022) -
partition with quorum

Last updated: Tue Oct  3 07:29:46 2023

Last change: Sun May  1 17:02:03 2022 by hacluster via crmd on server3

2 nodes configured

2 resource instances configured

Online: [ server3 server4 ]

Full list of resources:

Clone Set: smtpout-postfix-res-clone [smtpout-postfix-res]

 Started: [ server3 server4 ]

Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

*[root@server3 ~]# postconf -d | grep mail_version*

mail_version = 2.10.1

milter_macro_v = $mail_name 

Re: [ClusterLabs] last man - issue

2023-07-25 Thread Tomas Jelinek

Lejeczek,

Thanks for raising this. We'll improve the error message to make it 
clear that the cluster needs to be stopped in order to proceed.


Tomas


Dne 24. 07. 23 v 14:22 Michal Pospíšil (he / him) napsal(a):

Hello all,

One more useful link that I forgot to include in my original reply - 
list of corosync options and when it is possible to update them: 
https://github.com/corosync/corosync/wiki/Config-file-values



Cheers,
Michal

On Mon, 24 Jul 2023 at 13:04, Michal Pospíšil (he / him) 
 wrote:


Hi lejeczek,

Unfortunately, that command shouldn't work in this situation. The
problem is that the cluster (corosync) is running. From the man page:

update [auto_tie_breaker=[0|1]] [last_man_standing=[0|1]]
[last_man_standing_window=[]] [wait_for_all=[0|1]]
             Add, remove or change quorum options. At least one
option must be specified. Unspecified options will be kept
unchanged. If you wish to remove an option, set it to  empty
 value,  i.e.
             'option_name='. Options are documented in corosync's
votequorum(5) man page. *Requires the cluster to be stopped.*

The output looks correct. More inline...

On Sat, 22 Jul 2023 at 13:44, lejeczek via Users
 wrote:

Hi guys.

That below should work, right?

-> $ pcs quorum update last_man_standing=1 --skip-offline
Checking corosync is not running on nodes...


Corosync cannot be running to perform the requested corosync.conf
update.

Warning: Unable to connect to dzien (Failed to connect to
dzien port 2224: No route to host)

Warning: dzien: Unable to check if corosync is not running


Because of `--skip-offline`, these two messages are warnings, not
errors.

Error: swir: corosync is running
Error: whale: corosync is running


Turns out corosync is running on two other nodes, this is why pcs
refuses to continue. You have to stop the cluster, unfortunately.

Error: Errors have occurred, therefore pcs is unable to continue

if it should, why does it not or what else am I missing?

many thanks, L

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



Cheers,
Michal
-- 


MICHAL POSPISIL
He / Him / His

Associate Software Engineer

RHEL HA Cluster - PCS

Red Hat Czech, s.r.o. 
Purkyňova 97b, 612 00 Brno

mposp...@redhat.com IRC: mpospisi





--

MICHAL POSPISIL
He / Him / His

Associate Software Engineer

RHEL HA Cluster - PCS

Red Hat Czech, s.r.o. 
Purkyňova 97b, 612 00 Brno

mposp...@redhat.com IRC: mpospisi




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home:https://www.clusterlabs.org/___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.6 released

2023-06-21 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.6.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.6.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.6.zip

This release fixes a regression causing crashes in `pcs resource move`.
Command `pcs config checkpoint diff` is also fixed. On top of that, it
is now possible to export constraints and cluster properties as pcs
commands or JSON, which brings us closer to our goal of exporting the
whole cluster configuration in these formats.


Complete change log for this release:
## [0.11.6] - 2023-06-20

### Added
- Support for output formats `json` and `cmd` to constraints config
  commands ([rhbz#2179388], [rhbz#1423473], [rhbz#2163953])
- Automatic restarts of the Puma web server in the legacy Ruby daemon to
  reduce its memory footprint ([rhbz#1860626])
- New URL for listing pcsd capabilities: `/capabilities`
- It is now possible to list pcsd capabilities even if pcsd is not
  running: `pcsd --version --full`
- Add lib commands `cluster_property.get_properties` and
  `cluster_property.get_properties_metadata` to API v2
- Add `pcs property defaults` and `pcs property describe` CLI commands
- Support for output formats `json` and `cmd` to property config command
  ([rhbz#2163914])
- Commands `pcs resource describe` and `pcs stonith describe` print
  detailed info about resource options (data type or allowed values,
  default value)
- Add warning to `pcs resource utilization` and `pcs node utilization`
  for the case configuration is not in effect (cluster property
  `placement-strategy` is not set appropriately) ([rhbz#1465829])
- New format of `pcs resource create` command which requires `meta`
  keyword for specifying clone and promotable meta attributes is
  available to be enabled by specifying --future ([rhbz#2168155])

### Fixed
- Crash in commands that ask for user input (like `pcs cluster destroy`)
  when stdin is closed ([ghissue#612])
- Fix displaying differences between configuration checkpoints in `pcs
  config checkpoint diff` command ([rhbz#2175881])
- Fix `pcs stonith update-scsi-devices` command which was broken since
  Pacemaker-2.1.5-rc1 ([rhbz#2177996])
- Make `pcs resource disable --simulate --brief` documentation clearer
  ([rhbz#2109852])
- Fixed a regression causing crash in `pcs resource move` command
  (broken since pcs-0.11.5) ([rhbz#2210855])
- Using `--force` in `pcs resource meta` command had no effect on a
  specific error message even if the message suggested otherwise.

### Changed
- Commands for displaying cluster configuration have been slightly
  updated:
  - Headings of empty sections are no longer displayed
  - Resource listing is more dense as operations options are shown in a
single line
  - Specifying `--full` to show IDs of elements now shows IDs of nvpairs
as well

### Deprecated
- Specifying clone and promotable meta attributes without the `meta`
  keyword is now deprecated, i.e. `pcs resource clone myResource
  name=value` is deprecated by `pcs resource clone myResource meta
  name=value` ([rhbz#2168155], [ghpull#648])


Thanks / congratulations to everyone who contributed to this release,
including lixin, Michal Pospisil, Miroslav Lisik, Ondrej Mular and Tomas
Jelinek.

Cheers,
Tomas


[ghissue#612]: https://github.com/ClusterLabs/pcs/issues/612
[ghpull#648]: https://github.com/ClusterLabs/pcs/pull/648
[rhbz#1423473]: https://bugzilla.redhat.com/show_bug.cgi?id=1423473
[rhbz#1465829]: https://bugzilla.redhat.com/show_bug.cgi?id=1465829
[rhbz#1860626]: https://bugzilla.redhat.com/show_bug.cgi?id=1860626
[rhbz#2109852]: https://bugzilla.redhat.com/show_bug.cgi?id=2109852
[rhbz#2163914]: https://bugzilla.redhat.com/show_bug.cgi?id=2163914
[rhbz#2163953]: https://bugzilla.redhat.com/show_bug.cgi?id=2163953
[rhbz#2168155]: https://bugzilla.redhat.com/show_bug.cgi?id=2168155
[rhbz#2175881]: https://bugzilla.redhat.com/show_bug.cgi?id=2175881
[rhbz#2177996]: https://bugzilla.redhat.com/show_bug.cgi?id=2177996
[rhbz#2179388]: https://bugzilla.redhat.com/show_bug.cgi?id=2179388
[rhbz#2210855]: https://bugzilla.redhat.com/show_bug.cgi?id=2210855

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.17 released

2023-06-20 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.17.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.17.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.17.zip

Complete change log for this release:
## [0.10.17] - 2023-06-19

### Added
- Automatic restarts of the Puma web server in the legacy Ruby daemon to
  reduce its memory footprint ([rhbz#2189958])
- Add `pcs property defaults` and `pcs property describe` CLI commands
- Support for output formats `json` and `cmd` to property config command
  ([rhbz#2166289])
- Commands `pcs resource describe` and `pcs stonith describe` print
  detailed info about resource options (data type or allowed values,
  default value)
- Add warning to `pcs resource utilization` and `pcs node utilization`
  for the case configuration is not in effect (cluster property
  `placement-strategy` is not set appropriately) ([rhbz#2112259])


Thanks / congratulations to everyone who contributed to this release,
including Michal Pospisil, Miroslav Lisik and Tomas Jelinek.

Cheers,
Tomas


[rhbz#2112259]: https://bugzilla.redhat.com/show_bug.cgi?id=2112259
[rhbz#2166289]: https://bugzilla.redhat.com/show_bug.cgi?id=2166289
[rhbz#2189958]: https://bugzilla.redhat.com/show_bug.cgi?id=2189958

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Change hacluster password

2023-06-14 Thread Tomas Jelinek

Hi Jérôme,

Assuming you are asking about changing 'hacluster' password and its 
impact to pcs authentication, the answer is that there is no impact and 
you don't need to re-authenticate your nodes. If you have no tokens or 
known-hosts in /var/lib/pcsd, then your nodes are not authenticated to 
begin with, anyway.


To re-authenticate pcs on your cluster nodes, run 'pcs cluster auth'. 
This will authenticate all nodes in the local cluster. Alternatively, 
you can use 'pcs host auth' command, which allows you to specify nodes 
manually. None of these commands have impact on resources running in the 
cluster.


If you are using pcs web UI, you need to use new passwords when logging 
in, obviously.


Regards,
Tomas


Dne 13. 06. 23 v 11:43 Jérôme BECOT napsal(a):

Hello,

On a running cluster, if I want to change the users password on all 
nodes, would I need to re-authenticate the nodes to correctly apply this 
change in Pacemaker ? How to gently re-authenticate on Debian ? Would 
this need a maintenance ?


I found this link: https://access.redhat.com/articles/1396123

But I have no tokens in /var/lib/pcsd on my nodes

Thank you

--
*Jérôme BECOT*

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs property maintenance mode --wait not supported

2023-06-02 Thread Tomas Jelinek



Dne 02. 06. 23 v 3:33 Reid Wahl napsal(a):

On Thu, Jun 1, 2023 at 6:38 AM S Sathish S via Users
 wrote:


Hi Team,

The ‘--wait’ option is not supported in pcs property maintenance mode which is 
working earlier in pcs-0.9.x version. To understand may I know, why --wait 
option got removed. Could you please help on this.

  pcs property set maintenance-mode=true/false --wait=120

  [root@node1 user]#  pcs property set maintenance-mode=false --wait=120
Error: Specified option '--wait' is not supported in this command
[root@node1 user]#

  Below rpms we are using.
pcs-0.10.16-1.el8.x86_64
pacemaker-2.1.5-1.el8.x86_64
corosync-3.1.7-1.el8.x86_64
resource-agents-4.12.0-1.el8.x86_64
libknet1-1.25-1.el8.x86_64

Thanks and Regards,
S Sathish S


I believe `--wait` was ignored for the property command in pcs-0.9 but
didn't throw an error. It looks like the argument validation has
improved in pcs-0.10. One of the pcs devs can correct me if I'm wrong.


That is correct, `--wait` was never implemented in `pcs property set` 
command. It's not even documented.


Regards,
Tomas





___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/






___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HSTS Missing From HTTPS Server on pcs daemon

2023-04-06 Thread Tomas Jelinek

Hi S Sathish S,

New pcs-0.10.16 version containing the fix for this issue has just been 
released upstream.


Regards,
Tomas


Dne 04. 04. 23 v 19:14 S Sathish S napsal(a):

Hi Tomas/Team,

In our case PCS WEB UI us disabled while accessing PCS WEB UI URL we are 
getting 404 response, As you stated we are getting this vulnerability 
“HSTS Missing From HTTPS Server”  on Tenable scan.


While going through changelog we can see fixes are available in 
unreleased version can we know when we can expect formally release ? any 
tentative timeline please.


Set |Content-Security-Policy: frame-ancestors 'self'; default-src 
'self'| HTTP header for HTTP 404 responses (rhbz#2160555 
)


Thanks and Regards,
S Sathish S



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.16 released

2023-04-06 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.16.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.16.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.16.zip

Extended validation of resource and stonith attributes, added in the
previous release, is now disabled by default.

Complete change log for this release:
## [0.10.16] - 2023-04-06

### Added
- Warning to `pcs resource|stonith update` commands about not using
  agent self-validation feature when the resource is already
  misconfigured ([rhbz#2151511])

### Fixed
- Displaying bool and integer values in `pcs resource config` command
  ([rhbz#2151166], [ghissue#604])
- Allow time values in stonith-watchdog-time property ([rhbz#2158804])
- Enable/Disable sbd when cluster is not running ([rhbz#2166243])
- Confusing error message in `pcs constraint ticket add` command
  ([rhbz#2022748], [ghpull#559])
- Set `Content-Security-Policy: frame-ancestors 'self'; default-src
  'self'` HTTP header for HTTP 404 responses ([rhbz#2160555])
- Set `autocomplete="off"` for a password input field in login page in
  web UI ([rhbz#1957591])
- Validate dates in location constraint rules ([ghpull#644],
  [rhbz#2178707])
- Fix displaying differences between configuration checkpoints in `pcs
  config checkpoint diff` command ([rhbz#2176490])
- Crash in commands that ask for user input (like `pcs cluster destroy`)
  when stdin is closed ([ghissue#612])
- Fix `pcs stonith update-scsi-devices` command which was broken since
  Pacemaker-2.1.5-rc1 ([rhbz#2179010])
- Displaying location constraints with rules when grouped by nodes
  ([rhbz#2166294])

### Changed
- Resource/stonith agent self-validation of instance attributes is now
  disabled by default, as many agents do not work with it properly. Use
  flag '--agent-validation' to enable it in supported commands.
  ([rhbz#2159455])


Thanks / congratulations to everyone who contributed to this release,
including lixin, Lucas Kanashiro, Mamoru TASAKA, Michal Pospisil,
Miroslav Lisik, Ondrej Mular, Tomas Jelinek and wangluwei.

Cheers,
Tomas


[ghissue#604]: https://github.com/ClusterLabs/pcs/issues/604
[ghissue#612]: https://github.com/ClusterLabs/pcs/issues/612
[ghpull#559]: https://github.com/ClusterLabs/pcs/pull/559
[ghpull#644]: https://github.com/ClusterLabs/pcs/pull/644
[rhbz#1957591]: https://bugzilla.redhat.com/show_bug.cgi?id=1957591
[rhbz#2022748]: https://bugzilla.redhat.com/show_bug.cgi?id=2022748
[rhbz#2151166]: https://bugzilla.redhat.com/show_bug.cgi?id=2151166
[rhbz#2151511]: https://bugzilla.redhat.com/show_bug.cgi?id=2151511
[rhbz#2158804]: https://bugzilla.redhat.com/show_bug.cgi?id=2158804
[rhbz#2159455]: https://bugzilla.redhat.com/show_bug.cgi?id=2159455
[rhbz#2160555]: https://bugzilla.redhat.com/show_bug.cgi?id=2160555
[rhbz#2166243]: https://bugzilla.redhat.com/show_bug.cgi?id=2166243
[rhbz#2166294]: https://bugzilla.redhat.com/show_bug.cgi?id=2166294
[rhbz#2176490]: https://bugzilla.redhat.com/show_bug.cgi?id=2176490
[rhbz#2178707]: https://bugzilla.redhat.com/show_bug.cgi?id=2178707
[rhbz#2179010]: https://bugzilla.redhat.com/show_bug.cgi?id=2179010

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HSTS Missing From HTTPS Server on pcs daemon

2023-04-04 Thread Tomas Jelinek

Hi S Sathish S,

pcs is sending Strict-Transport-Security header since version 
pcs-0.9.168. There were further fixes in pcs-0.10 branch which you can 
find in pcs changelog [1]:
* in pcs-0.10.5: Added missing Strict-Transport-Security headers to 
redirects
* in pcs-0.10.14: Set 'Strict-Transport-Security: max-age=63072000' HTTP 
header for all responses


The only known bug regarding the header is that it is not being sent in 
HTTP 404 responses (requests for not-existing URLs). This is already 
fixed upstream and the fix will be included in the upcoming pcs release.


If you think the header is missing somewhere else, please provide a 
reproducer, so we can take a closer look at it.



Regards,
Tomas


[1]: https://github.com/ClusterLabs/pcs/blob/pcs-0.10/CHANGELOG.md



Dne 03. 04. 23 v 15:37 S Sathish S via Users napsal(a):

Hi Team,

In our product we are using pcs-0.10.15 version while running tenable 
scan found below vulnerability reported on 2224 pcsd daemon. Moreover we 
have disable PCSD Web UI in our application still vulnerability reported 
in the system.


Plugin ID : 84502

Plugin Name : HSTS Missing From HTTPS Server

Please provide any mitigation plan for this.

Thanks and Regards,
S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.5 released

2023-03-02 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.5.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.5.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.11.5.zip

Extended validation of resource and stonith attributes, added in the
previous release, is now disabled by default. We plan to enable it in
the future, once related issues in agents are resolved.
Other than that, this release brings a bunch of bug fixes.

Complete change log for this release:
## [0.11.5] - 2023-03-01

### Added
- Warning to `pcs resource|stonith update` commands about not using
  agent self-validation feature when the resource is already
  misconfigured ([rhbz#2151524])
- Add lib command `cluster_property.set_properties` to API v2
- Commands for checking and creating qdevice certificates on the local
  node only

### Fixed
- Graceful stopping pcsd service using `systemctl stop pcsd` command
- Displaying bool and integer values in `pcs resource config` command
  ([rhbz#2151164], [ghissue#604])
- Allow time values in stonith-watchdog-timeout property
  ([rhbz#2158790])
- Enable/Disable sbd when cluster is not running ([rhbz#2166249])
- Confusing error message in `pcs constraint ticket add` command
  ([rhbz#2168617], [ghpull#559])
- Internal server error during cluster setup with Ruby 3.2
- Set `Content-Security-Policy: frame-ancestors 'self'; default-src
  'self'` HTTP header for HTTP 404 responses as well ([rhbz#2160664])
- Validate dates in location constraint rules ([ghpull#644])

### Changed
- Resource/stonith agent self-validation of instance attributes is now
  disabled by default, as many agents do not work with it properly. Use
  flag '--agent-validation' to enable it in supported commands.
  ([rhbz#2159454])


Thanks / congratulations to everyone who contributed to this release,
including lixin, Lucas Kanashiro, Mamoru TASAKA, Michal Pospisil,
Miroslav Lisik, Ondrej Mular, Tomas Jelinek and wangluwei.

Cheers,
Tomas


[ghissue#604]: https://github.com/ClusterLabs/pcs/issues/604
[ghpull#559]: https://github.com/ClusterLabs/pcs/pull/559
[ghpull#644]: https://github.com/ClusterLabs/pcs/pull/644
[rhbz#2151164]: https://bugzilla.redhat.com/show_bug.cgi?id=2151164
[rhbz#2151524]: https://bugzilla.redhat.com/show_bug.cgi?id=2151524
[rhbz#2158790]: https://bugzilla.redhat.com/show_bug.cgi?id=2158790
[rhbz#2159454]: https://bugzilla.redhat.com/show_bug.cgi?id=2159454
[rhbz#2160664]: https://bugzilla.redhat.com/show_bug.cgi?id=2160664
[rhbz#2166249]: https://bugzilla.redhat.com/show_bug.cgi?id=2166249
[rhbz#2168617]: https://bugzilla.redhat.com/show_bug.cgi?id=2168617

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

2023-01-31 Thread Tomas Jelinek

Hi A Gunasekar,

These CVEs are fixed in pcs-0.10.9 and newer and pcs-0.11.1 and newer 
(the 0.11 branch was never affected).


Regards,
Tomas


Dne 27. 01. 23 v 9:01 A Gunasekar via Users napsal(a):


Hi Tomas/Team,

It would be great if you share in which latest cluster lab version the 
fixed are available for these CVE, so that we will take that version 
for upgrade.


Ericsson 


*Gunasekar A ***

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com **

Hi A Gunasekar,

The pcs-0.9 branch is unsupported and no longer maintained since

2021-04-16. There will be no further releases and commits in that

branch. Pcs-0.9 only works with Pacemaker 1.x and Corosync 2.x and those

have been unsupported for quite some time as well.

I recommend updating your cluster stack to newer versions.

Regards,

Tomas

*From:*A Gunasekar
*Sent:* 20 January 2023 15:55
*To:* Reid Wahl ; Cluster Labs - All topics related 
to open-source clustering welcomed 
*Cc:* M Vasanthakumar ; S Sathish S 


*Subject:* RE: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

Hi Wahl/Team,

The solution Tomas  as suggested is from Redhat delivered rpm packages 
“*pcs-0.9.169-3.el7_9.3*”.


But we are using Cluster Lab  source packages to build pcs rpms for 
 our node.


So it would be good if we get the fixed release details from Cluster 
Lab for the reported CVEs.


Ericsson 


*Gunasekar A *

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com **

*From:*A Gunasekar
*Sent:* 20 January 2023 15:12
*To:* Reid Wahl 
*Cc:* M Vasanthakumar ; S Sathish S 


*Subject:* RE: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

Thanks Wahl for this information

*From:*Reid Wahl 
*Sent:* 20 January 2023 11:57
*To:* A Gunasekar 
*Cc:* M Vasanthakumar ; S Sathish S 


*Subject:* Re: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

On Thu, Jan 19, 2023 at 9:19 PM A Gunasekar  
wrote:


Hi Wahl,

Tomas update was not visible to us  and Thanks for sharing it here.

https://lists.clusterlabs.org/pipermail/users/2022-December/030734.html



You're welcome. Unfortunately, the threads are separated by month. So 
if a reply is sent in a different month, it doesn't appear in the 
original thread. You sent your original email in December, and Tomas 
replied in January. See the following links:


https://lists.clusterlabs.org/pipermail/users/2023-January/thread.html 



https://lists.clusterlabs.org/pipermail/users/2023-January/030750.html 



Ericsson 


*Gunasekar A *

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com 

*From:*Reid Wahl 
*Sent:* 20 January 2023 03:07
*To:* Cluster Labs - All topics related to open-source clustering
welcomed 
*Cc:* A Gunasekar ; M Vasanthakumar
; S Sathish S 
*Subject:* Re: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

On Thu, Jan 19, 2023 at 12:54 PM A Gunasekar via Users
 wrote:

Hi Team,

Can we get some update on this.

Hi,

What update are you seeking? It looks like Tomas already answered
your question. I'll paste his answer again here.

> Hi A Gunasekar,
>
> As far as I can see, updated pcs packages pcs-0.9.169-3.el7_9.3
which
> fix the mentioned CVEs were released on 2022-11-02.
>
> Regards,
> Tomas

Ericsson 


*Gunasekar A *

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com

*From:*A Gunasekar
*Sent:* 21 December 2022 18:59
*To:* users@clusterlabs.org
*Cc:* S Sathish S ; M Vasanthakumar

*Subject:* Fix for CVE-2022-30123 and CVE-2019-11358

Hi Team,

Please be informed, we have got notified from our security
tool that our pcs version 0.9 is affected by the
*CVE-2022-30123 and CVE-2019-11358*.

It 

Re: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

2023-01-23 Thread Tomas Jelinek

Hi A Gunasekar,

The pcs-0.9 branch is unsupported and no longer maintained since 
2021-04-16. There will be no further releases and commits in that 
branch. Pcs-0.9 only works with Pacemaker 1.x and Corosync 2.x and those 
have been unsupported for quite some time as well.


I recommend updating your cluster stack to newer versions.

Regards,
Tomas


Dne 20. 01. 23 v 11:23 Reid Wahl napsal(a):



On Fri, Jan 20, 2023 at 2:19 AM A Gunasekar > wrote:


Hi Wahl.

__ __

The solution Tomas  as suggested is from Redhat delivered rpm
packages “*pcs-0.9.169-3.el7_9.3*”. 

__ __

But we are using Cluster Lab  delivered rpm packages in our node.

__ __

So it would be good if we get fixed deliverables from Cluster Lab 
delivered rpms.



+ users list

Please include the mailing list on emails



__ __

__ __

__ __

Ericsson 

*Gunasekar A ***

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com
**

__ __

__ __

__ __

__ __

*From:*A Gunasekar
*Sent:* 20 January 2023 15:12
*To:* Reid Wahl mailto:nw...@redhat.com>>
*Cc:* M Vasanthakumar mailto:m.vasanthaku...@ericsson.com>>; S Sathish S
mailto:s.s.sath...@ericsson.com>>
*Subject:* RE: [ClusterLabs] Fix for CVE-2022-30123 and
CVE-2019-11358

__ __

Thanks Wahl for this information 

__ __

__ __

__ __

*From:*Reid Wahl mailto:nw...@redhat.com>>
*Sent:* 20 January 2023 11:57
*To:* A Gunasekar mailto:a.gunase...@ericsson.com>>
*Cc:* M Vasanthakumar mailto:m.vasanthaku...@ericsson.com>>; S Sathish S
mailto:s.s.sath...@ericsson.com>>
*Subject:* Re: [ClusterLabs] Fix for CVE-2022-30123 and
CVE-2019-11358

__ __

__ __

__ __

On Thu, Jan 19, 2023 at 9:19 PM A Gunasekar
mailto:a.gunase...@ericsson.com>> wrote:

Hi Wahl,



Tomas update was not visible to us  and Thanks for sharing it
here.

https://lists.clusterlabs.org/pipermail/users/2022-December/030734.html 


__ __

You're welcome. Unfortunately, the threads are separated by month.
So if a reply is sent in a different month, it doesn't appear in the
original thread. You sent your original email in December, and Tomas
replied in January. See the following links:

https://lists.clusterlabs.org/pipermail/users/2023-January/thread.html 


https://lists.clusterlabs.org/pipermail/users/2023-January/030750.html 


__ __





Ericsson 

*Gunasekar A *

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com


*From:*Reid Wahl mailto:nw...@redhat.com>>
*Sent:* 20 January 2023 03:07
*To:* Cluster Labs - All topics related to open-source
clustering welcomed mailto:users@clusterlabs.org>>
*Cc:* A Gunasekar mailto:a.gunase...@ericsson.com>>; M Vasanthakumar
mailto:m.vasanthaku...@ericsson.com>>; S Sathish S
mailto:s.s.sath...@ericsson.com>>
*Subject:* Re: [ClusterLabs] Fix for CVE-2022-30123 and
CVE-2019-11358







On Thu, Jan 19, 2023 at 12:54 PM A Gunasekar via Users
mailto:users@clusterlabs.org>> wrote:

Hi Team,



Can we get some update on this.



Hi,



What update are you seeking? It looks like Tomas already
answered your question. I'll paste his answer again here.



 > Hi A Gunasekar,
 >
 > As far as I can see, updated pcs packages
pcs-0.9.169-3.el7_9.3 which
 > fix the mentioned CVEs were released on 2022-11-02.
 >
 > Regards,
 > Tomas







Ericsson 

Re: [ClusterLabs] multiple resources - pgsqlms - and IP(s)

2023-01-04 Thread Tomas Jelinek

Dne 04. 01. 23 v 8:29 Reid Wahl napsal(a):

On Tue, Jan 3, 2023 at 10:53 PM lejeczek via Users
 wrote:




On 03/01/2023 21:44, Ken Gaillot wrote:

On Tue, 2023-01-03 at 18:18 +0100, lejeczek via Users wrote:

On 03/01/2023 17:03, Jehan-Guillaume de Rorthais wrote:

Hi,

On Tue, 3 Jan 2023 16:44:01 +0100
lejeczek via Users  wrote:


To get/have Postgresql cluster with 'pgsqlms' resource, such
cluster needs a 'master' IP - what do you guys do when/if
you have multiple resources off this agent?
I wonder if it is possible to keep just one IP and have all
those resources go to it - probably 'scoring' would be very
tricky then, or perhaps not?

That would mean all promoted pgsql MUST be on the same node at any
time.
If one of your instance got some troubles and need to failover,
*ALL* of them
would failover.

This imply not just a small failure time window for one instance,
but for all
of them, all the users.


Or you do separate IP for each 'pgsqlms' resource - the
easiest way out?

That looks like a better option to me, yes.

Regards,

Not related - Is this an old bug?:

-> $ pcs resource create pgsqld-apps ocf:heartbeat:pgsqlms
bindir=/usr/bin pgdata=/apps/pgsql/data op start timeout=60s
op stop timeout=60s op promote timeout=30s op demote
timeout=120s op monitor interval=15s timeout=10s
role="Master" op monitor interval=16s timeout=10s
role="Slave" op notify timeout=60s meta promotable=true
notify=true master-max=1 --disable
Error: Validation result from agent (use --force to override):
 ocf-exit-reason:You must set meta parameter notify=true
for your master resource
Error: Errors have occurred, therefore pcs is unable to continue

pcs now runs an agent's validate-all action before creating a resource.
In this case it's detecting a real issue in your command. The options
you have after "meta" are clone options, not meta options of the
resource being cloned. If you just change "meta" to "clone" it should
work.

Nope. Exact same error message.
If I remember correctly there was a bug specifically
pertained to 'notify=true'


The only recent one I can remember was a core dump.
- Bug 2039675 - pacemaker coredump with ocf:heartbeat:mysql resource
(https://bugzilla.redhat.com/show_bug.cgi?id=2039675)

 From a quick inspection of the pcs resource validation code
(lib/pacemaker/live.py:validate_resource_instance_attributes_via_pcmk()),
it doesn't look like it passes the meta attributes. It only passes the
instance attributes. (I could be mistaken.)

The pgsqlms resource agent checks the notify meta attribute's value as
part of the validate-all action. If pcs doesn't pass the meta
attributes to crm_resource, then the check will fail.



Pcs cannot pass meta attributes to crm_resource, because there is 
nowhere to pass them to. As defined in OCF 1.1, only instance attributes 
matter for validation, see 
https://github.com/ClusterLabs/OCF-spec/blob/main/ra/1.1/resource-agent-api.md#check-levels



The agents are bugged - they depend on meta data being passed to 
validation. This is already tracked and being worked on:


https://github.com/ClusterLabs/resource-agents/pull/1826

Bug 2157872 - resource-agents: fix validate-all issue with new 
pcs/pacemaker by only running some checks when OCF_CHECK_LEVEL=10

https://bugzilla.redhat.com/show_bug.cgi?id=2157872

2149113 - pcs can't create MS SQL Server cluster resources
https://bugzilla.redhat.com/show_bug.cgi?id=2149113


Regards,
Tomas


I'm on C8S with resource-agents-paf-4.9.0-35.el8.x86_64.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/






___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Fix for CVE-2022-30123 and CVE-2019-11358

2023-01-02 Thread Tomas Jelinek

Hi A Gunasekar,

As far as I can see, updated pcs packages pcs-0.9.169-3.el7_9.3 which 
fix the mentioned CVEs were released on 2022-11-02.


Regards,
Tomas


Dne 21. 12. 22 v 14:28 A Gunasekar via Users napsal(a):

Hi Team,

Please be informed, we have got notified from our security tool that our 
pcs version 0.9 is affected by the *CVE-2022-30123 and CVE-2019-11358*.


It would be great if we help to get answers for the below queries.

**

  * We are currently in RHEL 7.9 OS and using pcs 0.9 version, Is there
any fix planned/available for this affection version (0.9.x) of pcs ?**
  * Let us know in which release this CVEs fix are planned ?**

**

*Our system Details:-*

OS Version: RHEL 7.9

Cluster lab PCS  version: 0.9

Ericsson 

*Gunasekar A ***

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com **


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PCS WEB UI is not reachable via inside/outside of the server getting "404 Page Not Found" Error

2022-12-15 Thread Tomas Jelinek

Hi S Sathish,

It looks like you have disabled the web UI by putting 
'PCSD_DISABLE_GUI=true' into pcsd config file ('/etc/default/pcsd' or 
'/etc/sysconfig/pcsd', depending on your Linux distribution).


The 'pcsd.conf' file is not present in pcs-0.10 and pcs-0.11, that is 
expected.



Regards,
Tomas


Dne 14. 12. 22 v 6:46 S Sathish S via Users napsal(a):

Hi Team,

we have took pcs-0.10.14 version and then compiled build rpm 
successfully, All pcs remote command is working as expected able to form 
cluster and create required resource group for the same.


Issue : PCS WEB UI is not reachable via inside/outside of the server 
getting "404 Page Not Found" Error, please find input from our analysis.


# curl -k -vvv https://node1:2224/ 

* Rebuilt URL to: https://node1:2224/ 

*   Trying 10.x.x.x...

* TCP_NODELAY set

* Connected to node1 (10.x.x.x) port 2224 (#0)

* ALPN, offering h2

* ALPN, offering http/1.1

* successfully set certificate verify locations:

*   CAfile: /etc/pki/tls/certs/ca-bundle.crt

   CApath: none

* TLSv1.3 (OUT), TLS handshake, Client hello (1):

* TLSv1.3 (IN), TLS handshake, Server hello (2):

* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):

* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, Certificate (11):

* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, CERT verify (15):

* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, Finished (20):

* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):

* TLSv1.3 (OUT), TLS handshake, [no content] (0):

* TLSv1.3 (OUT), TLS handshake, Finished (20):

* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384

* ALPN, server did not agree to a protocol

* Server certificate:

*  subject: C=IN; ST=TN; L=CHENNAI; O=Organe; OU=OCC; CN=node1.

*  start date: Dec  2 06:22:52 2022 GMT

*  expire date: Nov 29 06:22:52 2032 GMT

*  issuer: C=IN; ST=TN; L=CHENNAI; O=Organe; OU=OCC; CN=node1.

*  SSL certificate verify result: self signed certificate (18), 
continuing anyway.


* TLSv1.3 (OUT), TLS app data, [no content] (0):


GET / HTTP/1.1



Host: node1:2224



User-Agent: curl/7.61.1



Accept: */*






* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):

* TLSv1.3 (IN), TLS handshake, [no content] (0):

* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):

* TLSv1.3 (IN), TLS app data, [no content] (0):

< HTTP/1.1 404 Not Found

< Server: TornadoServer/6.1

< Content-Type: text/html; charset=UTF-8

< Date: Tue, 13 Dec 2022 00:43:56 GMT

< Content-Length: 69

<

* Connection #0 to host node1 left intact

404: Not Found404: Not Found

Kindly let me know if any additional information need to analysis on this.

One more observation in older pcs-0.9 version will have 
/usr/lib/pcsd/pcsd.conf and /etc/sysconfig/pcsd configuration file . In 
pcs-0.10 version onward
after build rpm we dont see /usr/lib/pcsd/pcsd.conf file , Only 
configure /etc/sysconfig/pcsd file should be sufficient on latest of 
pcs-0.10 version?


Thanks and Regards,
S Sathish


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.4 released

2022-11-29 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.4.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.4.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.4.zip

This release brings improved validation of resource instance attributes
and cluster properties. There are also important fixes for a security
issue and few regressions.

Complete change log for this release:
## [0.11.4] - 2022-11-21

### Security
- CVE-2022-2735 pcs: obtaining an authentication token for hacluster
  user could lead to privilege escalation ([rhbz#2116841])

### Added
- API v2 providing asynchronous interface for pcsd. Note that this
  feature is in tech-preview state and thus may be changed in the future
- Support for resource/stonith agent self-validation of instance
  attributes via pacemaker ([rhbz#2112270])
- Support for booth 'enable-authfile' fix ([rhbz#2116295])

### Fixed
- `pcs resource manage --monitor` no longer enables monitor operation
  for all resources in a group if only one of the resources was
  requested to become managed ([rhbz#2092950])
- `pcs resource restart` command works again (broken in pcs-0.11.3)
  ([rhbz#2102663])
- Misleading error message from `pcs booth sync` when booth config
  directory (`/etc/booth`) is missing ([rhbz#1791670])
- Creating a promotable or globally-unique clones is not allowed for
  non-ocf resource agents ([rhbz#1493416])
- Improved cluster properties validators, OCF 1.1 now supported
  ([rhbz#2019464])
- `pcs property set/unset` forbid manipulation of specific cluster
  properties ([rhbz#1620043])


Thanks / congratulations to everyone who contributed to this release,
including Fabio M. Di Nitto, Ivan Devat, lixin, Lucas Kanashiro, Michal
Pospisil, Miroslav Lisik, Ondrej Mular and Tomas Jelinek.

Cheers,
Tomas


[rhbz#1493416]: https://bugzilla.redhat.com/show_bug.cgi?id=1493416
[rhbz#1620043]: https://bugzilla.redhat.com/show_bug.cgi?id=1620043
[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#2019464]: https://bugzilla.redhat.com/show_bug.cgi?id=2019464
[rhbz#2092950]: https://bugzilla.redhat.com/show_bug.cgi?id=2092950
[rhbz#2102663]: https://bugzilla.redhat.com/show_bug.cgi?id=2102663
[rhbz#2112270]: https://bugzilla.redhat.com/show_bug.cgi?id=2112270
[rhbz#2116295]: https://bugzilla.redhat.com/show_bug.cgi?id=2116295
[rhbz#2116841]: https://bugzilla.redhat.com/show_bug.cgi?id=2116841

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.15 released

2022-11-29 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.15.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.15.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.15.zip

This release brings improved validation of resource instance attributes
and cluster properties. There are also important fixes for a security
issue and few regressions.

Complete change log for this release:
## [0.10.15] - 2022-11-23

### Security
- CVE-2022-2735 pcs: obtaining an authentication token for hacluster
  user could lead to privilege escalation ([rhbz#2116838])

### Added
- Support for resource/stonith agent self-validation of instance
  attributes via pacemaker ([rhbz#1816852])
- Support for booth 'enable-authfile' fix ([rhbz#2132582])

### Fixed
- `pcs resource manage --monitor` no longer enables monitor operation
  for all resources in a group if only one of the resources was
  requested to become managed ([rhbz#1918527])
- Misleading error message from `pcs booth sync` when booth config
  directory (`/etc/booth`) is missing ([rhbz#1791670])
- `pcs quorum device remove` works again (broken since 0.10.13)
  ([rhbz#2115326])
- SBD enable from webui works again ([rhbz#2117650])
- Improved cluster properties validators, OCF 1.1 now supported
  ([rhbz#2112002])
- `pcs property set/unset` forbid manipulation of specific cluster
  properties ([rhbz#2112263])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, lixin, Michal Pospisil, Miroslav Lisik, Ondrej
Mular and Tomas Jelinek.

Cheers,
Tomas


[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#1816852]: https://bugzilla.redhat.com/show_bug.cgi?id=1816852
[rhbz#1918527]: https://bugzilla.redhat.com/show_bug.cgi?id=1918527
[rhbz#2112002]: https://bugzilla.redhat.com/show_bug.cgi?id=2112002
[rhbz#2112263]: https://bugzilla.redhat.com/show_bug.cgi?id=2112263
[rhbz#2115326]: https://bugzilla.redhat.com/show_bug.cgi?id=2115326
[rhbz#2116838]: https://bugzilla.redhat.com/show_bug.cgi?id=2116838
[rhbz#2117650]: https://bugzilla.redhat.com/show_bug.cgi?id=2117650
[rhbz#2132582]: https://bugzilla.redhat.com/show_bug.cgi?id=2132582

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker question

2022-10-05 Thread Tomas Jelinek

Hi,

If you are using pcs to setup your cluster, then the answer is no. I'm 
not sure about crm shell / hawk. Once you have a cluster, you can use 
users other than hacluster as Ken pointed out.


Regards,
Tomas


Dne 04. 10. 22 v 16:06 Ken Gaillot napsal(a):

Yes, see ACLs:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-acls

On Mon, 2022-10-03 at 15:51 +, Jelen, Piotr wrote:

Dear Clusterlabs team ,

I  would like to ask you if there is some possibility to use
different user (eg.cephhauser) for authenticate/setup cluster or
there is other method authenticate/setup cluster, not using password
by dedicated  pacamker user such us hacluster  ?


Best Regards
Piotr Jelen
Senior Systems Platform Engineer
  
Mastercard

Mountain View, Central Park  | Leopard
  

  
CONFIDENTIALITY NOTICE This e-mail message and any attachments are

only for the use of the intended recipient and may contain
information that is privileged, confidential or exempt from
disclosure under applicable law. If you are not the intended
recipient, any disclosure, distribution or other use of this e-mail
message or attachments is prohibited. If you have received this e-
mail message in error, please delete and notify the sender
immediately. Thank you.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Corosync over dedicated interface?

2022-10-03 Thread Tomas Jelinek

Dne 28. 09. 22 v 18:22 Jehan-Guillaume de Rorthais via Users napsal(a):

Hi,

A small addendum below.

On Wed, 28 Sep 2022 11:42:53 -0400
"Kevin P. Fleming"  wrote:


On Wed, Sep 28, 2022 at 11:37 AM Dave Withheld 
wrote:


Is it possible to get corosync to use the private network and stop trying
to use the LAN for cluster communications? Or am I totally off-base and am
missing something in my drbd/pacemaker configuration?


Absolutely! When I setup my two-node cluster recently I did exactly
that. If you are using 'pcs' to manage your cluster, ensure that you
add the 'addr=' parameter during 'pcs host auth' so that Corosync and
the layers above it will use that address for the host. Something
like:

$ pcs host auth cluster-node-1 addr=192.168.10.1 cluster-node-2
addr=192.168.10.2


You can even set multiple rings so corosync can rely on both:

   $ pcs host auth\
 cluster-node-1 addr=192.168.10.1 addr=10.20.30.1 \
 cluster-node-2 addr=192.168.10.2 addr=10.20.30.2


Hi,

Just a little correction.

The 'pcs host auth' command accepts only one addr= for each node. The 
address will be then used for pcs communication. If you don't put any 
addr= in the 'pcs cluster setup' command, it will be used for corosync 
communication as well.


However, if you want to set corosync to use multiple rings, you do that 
by specifying addr= in the 'pcs cluster setup' command like this:


pcs cluster setup cluster_name \
cluster-node-1 addr=192.168.10.1 addr=10.20.30.1 \
cluster-node-2 addr=192.168.10.2 addr=10.20.30.2

If you used addr= in the 'pcs host auth' command and you want the same 
address to be used by corosync, you need to specify that address in the 
'pcs cluster setup' command. If you only specify the second address, 
you'll end up with a one-ring cluster.



Regards,
Tomas



Then, compare (but do not edit!) your "/etc/corosync/corosync.conf" on all
nodes.

Regards,
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.3.1 released

2022-09-14 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.3.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.3.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.3.1.zip

This version fixes a high severity security issue CVE-2022-2735.

Complete change log for this release:
## [0.11.3.1] - 2022-09-09

### Security
- CVE-2022-2735 pcs: obtaining an authentication token for hacluster
  user could lead to privilege escalation ([rhbz#2116841])


Thanks / congratulations to everyone who contributed to this release.

Cheers,
Tomas


[rhbz#2116841]: https://bugzilla.redhat.com/show_bug.cgi?id=2116841

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.14.1 released

2022-09-14 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.14.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.14.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.14.1.zip

This version fixes a high severity security issue CVE-2022-2735.

Complete change log for this release:
## [0.10.14.1] - 2022-09-09

### Security
- CVE-2022-2735 pcs: obtaining an authentication token for hacluster
  user could lead to privilege escalation ([rhbz#2116838])


Thanks / congratulations to everyone who contributed to this release.

Cheers,
Tomas


[rhbz#2116841]: https://bugzilla.redhat.com/show_bug.cgi?id=2116838

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Fix for CVE-2022-2735 in pcs 0.9 version

2022-09-12 Thread Tomas Jelinek

Hi,

As far as I know, pcs-0.9.x isn't affected by CVE-2022-2735. Therefore, 
no fix for it is planned. Could you explain why you think it is affected?


Both main (pcs-0.11) and pcs-0.10 upstream branches do contain the fix. 
We are working on releasing new versions. In the meantime, you may use 
the top of the branches. Fixed packages have also already been released 
by various Linux distributions.


Regards,
Tomas



Dne 12. 09. 22 v 8:19 A Gunasekar via Users napsal(a):

Hi Team,

Please be informed, we have got notified from our security tool that our 
pcs version 0.9 is affected by the *CVE-2022-2735*.


It would be great if you help to get answers for the below queries.

**

  * We are currently in RHEL 7.9 OS and using pcs 0.9 version, Is there
any fix planned/available for this affection version (0.9.x) of pcs ?**
  *  From Cluster Lab portal, we can see even the pcs 0.10.x (or) the
main branch 0.11.x released versions don’t have fix for this CVE. So
kindly let us know in which release this CVE fix is planned ?**

**

*https://github.com/ClusterLabs/pcs/blob/main/CHANGELOG.md 
*


/Change Log/

/[Unreleased]/

/Security/

*/CVE-2022-2735 /*/pcs: obtaining an authentication token for hacluster 
user could lead to privilege escalation (rhbz#2116841)/


**

**

**

*Our system Details:-*

OS Version: RHEL 7.9

Cluster lab PCS  version: 0.9

**

**

**

Ericsson 

*Gunasekar A *

Senior Software Engineer

BDGS SA BSS PDU BSS PDG EC CH NGCRS

Mobile: +919894561292

Email ID: a.gunase...@ericsson.com **


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] node1 and node2 communication time question

2022-08-09 Thread Tomas Jelinek

Hi,

It seems that you are using pcs 0.9.x. That is an old and unmaintained 
version. I really recommend updating it.


I can see that you disabled stonith. This is really a bad practice. 
Cluster cannot and will not function properly without working stonith.


What makes you think nodes are communicating only every 30 seconds and 
not often? Setting 'monitor interval=30s' certainly doesn't do such thing.


Regards,
Tomas


Dne 09. 08. 22 v 8:23 권오성 napsal(a):

Hello.
I installed linux ha on raspberry pi as below.
1) 1) sudo apt-get install pacemaker pcs fence-agents resource-agents
2) Host Settings
3) 3) sudo reboot
4) 4) sudo passwd hacluster
5) 5) sudo systemctl enable pcsd, sudo systemctl start pcsd, sudo 
systemctl enable pacemaker

6) 6) sudo pcs cluster destroy
7) 7) sudo pcs cluster auth   -u hacluster -p for hacluster>

8) 8) sudo pcs cluster setup --name   
9) 9) sudo pcs cluster start —all, sudo pcs cluster enable —all
10) sudo pcs property set stonith-enabled=false
11) sudo pcs status
12) sudo pcs resource create Virtual IP ocf:heartbeat:IPaddr2 
ip= cidr_netmask=24op monitor interval=30s


So, I've set it up like this way.
By the way, is it correct that node1 and node2 communicate every 30 
seconds and node2 will notice after 30 seconds when node1 dies?

Or do we communicate every few seconds?
And can node1 and node2 reduce communication time?
What I want is node1 and node2 to communicate every 10 ms and switch as 
fast as possible.

Please answer.
Thank you.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Cannot add a node with pcs

2022-08-02 Thread Tomas Jelinek

Hi Piotr,

Sorry for the delay. I'm not a pacemaker expert, so I don't really know 
how pacemaker behaves in various corner cases. Even if I were, it would 
be difficult to advise you, since you haven't even posted what version 
of pacemaker / corosync / pcs you are using.


In any case, the first thing you need to do is configure stonith. 
Properly configured and working stonith is required for a cluster to 
operate. There is no way around it.



Regards,
Tomas


Dne 13. 07. 22 v 18:54 Piotr Szafarczyk napsal(a):

Hi Tomas,

Thank you very much for the idea. I have played with stonith_admin 
--unfence and --confirm. Whenever I try, pcs status show my actions 
under Failed Fencing Actions. I see this in the log file:


error: Unfencing of n2 by  failed: No such device

No surprise here, since I have not got any devices registered.

If fencing of n2 was a cause, I would expect pcs status to show it as 
offline or unhealthy, but show it. I have got:


   * 2 nodes configured

Also I would expect node remove + node clear + node add to make n2 a 
brand new node.


Here are parts of the log when I remove n2 from the cluster

No peers with id=0 and/or uname=n2 to purge from the membership cache
Removing all n2 attributes for peer n3
Removing all n2 attributes for peer n1
Instructing peers to remove references to node n2/0
Completed cib_delete operation for section status: OK

There is nothing in the log file when I add it.

If fencing is the cause, where should I look for what the cluster tries 
to do?


Have you got any other suggestions what to check?

Best regards,
Piotr

On 12.07.2022 12:50, Tomas Jelinek wrote:

Hi Piotr,

Based on 'pcs cluster node add n2' and 'pcs config' outputs, pcs added 
the node to your cluster successfully, that is corosync config has 
been modified, distributed and loaded.


It looks like the problem is with pacemaker. This is a wild guess, but 
maybe pacemaker wants to fence n2, which is not possible, as you 
disabled stonith. In the meantime, n1 and n3 do not allow n2 to join, 
until it's confirmed fenced. Try looking into / posting 'pcs status 
--full' and pacemaker log.


With stonith disabled, you have a working cluster (seemingly). Until 
you don't, due to an event which requires working stonith for the 
cluster to recover.


Regards,
Tomas


Dne 12. 07. 22 v 12:34 Piotr Szafarczyk napsal(a):

Hi,

I used to have a working cluster with 3 nodes (and stonith disabled). 
After an unexpected restart of one node, the cluster split. The node 
#2 started to see the others as unclean. Nodes 1 and 2 were 
cooperating with each other, showing #2 as offline. There were no 
network connection problems.


I removed #2 (operating from #1) with
pcs cluster node remove n2

I verified that it had removed all configuration from #2, both for 
corosync and for pacemaker. The cluster looks like working correctly 
with two nodes (and no traces of #2).


Now I am trying to add the third node back.
pcs cluster node add n2
Disabling SBD service...
n2: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
n2: successful distribution of the file 'corosync authkey'
n2: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
n3: Succeeded
n2: Succeeded
n1: Succeeded
n3: Corosync configuration reloaded

I am able to start #2 operating from #1

pcs cluster pcsd-status
   n2: Online
   n3: Online
   n1: Online

pcs cluster enable n2
pcs cluster start n2

I can see that corosync's configuration has been updated, but 
pacemaker's not.


_Checking from #1:_

pcs config
Cluster Name: n
Corosync Nodes:
  n1 n3 n2
Pacemaker Nodes:
  n1 n3
[...]

pcs status
   * 2 nodes configured
Node List:
   * Online: [ n1 n3 ]
[...]

pcs cluster cib scope=nodes

   
   


_#2 is seeing the state differently:_

pcs config
Cluster Name: n
Corosync Nodes:
  n1 n3 n2
Pacemaker Nodes:
  n1 n2 n3

pcs status
   * 3 nodes configured
Node List:
   * Online: [ n2 ]
   * OFFLINE: [ n1 n3 ]
Full List of Resources:
   * No resources
[...]
(there are resources configured on #1 and #3)

pcs cluster cib scope=nodes

   
   
   


Help me diagnose it please. Where should I look for the problem? (I 
have already tried a few things more - I see nothing helpful in log 
files, pcs --debug shows nothing suspicious, tried even editing the 
CIB manually)


Best regards,

Piotr Szafarczyk


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https

Re: [ClusterLabs] Cannot add a node with pcs

2022-07-12 Thread Tomas Jelinek

Hi Piotr,

Based on 'pcs cluster node add n2' and 'pcs config' outputs, pcs added 
the node to your cluster successfully, that is corosync config has been 
modified, distributed and loaded.


It looks like the problem is with pacemaker. This is a wild guess, but 
maybe pacemaker wants to fence n2, which is not possible, as you 
disabled stonith. In the meantime, n1 and n3 do not allow n2 to join, 
until it's confirmed fenced. Try looking into / posting 'pcs status 
--full' and pacemaker log.


With stonith disabled, you have a working cluster (seemingly). Until you 
don't, due to an event which requires working stonith for the cluster to 
recover.


Regards,
Tomas


Dne 12. 07. 22 v 12:34 Piotr Szafarczyk napsal(a):

Hi,

I used to have a working cluster with 3 nodes (and stonith disabled). 
After an unexpected restart of one node, the cluster split. The node #2 
started to see the others as unclean. Nodes 1 and 2 were cooperating 
with each other, showing #2 as offline. There were no network connection 
problems.


I removed #2 (operating from #1) with
pcs cluster node remove n2

I verified that it had removed all configuration from #2, both for 
corosync and for pacemaker. The cluster looks like working correctly 
with two nodes (and no traces of #2).


Now I am trying to add the third node back.
pcs cluster node add n2
Disabling SBD service...
n2: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'n2'
n2: successful distribution of the file 'corosync authkey'
n2: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
n3: Succeeded
n2: Succeeded
n1: Succeeded
n3: Corosync configuration reloaded

I am able to start #2 operating from #1

pcs cluster pcsd-status
   n2: Online
   n3: Online
   n1: Online

pcs cluster enable n2
pcs cluster start n2

I can see that corosync's configuration has been updated, but 
pacemaker's not.


_Checking from #1:_

pcs config
Cluster Name: n
Corosync Nodes:
  n1 n3 n2
Pacemaker Nodes:
  n1 n3
[...]

pcs status
   * 2 nodes configured
Node List:
   * Online: [ n1 n3 ]
[...]

pcs cluster cib scope=nodes

   
   


_#2 is seeing the state differently:_

pcs config
Cluster Name: n
Corosync Nodes:
  n1 n3 n2
Pacemaker Nodes:
  n1 n2 n3

pcs status
   * 3 nodes configured
Node List:
   * Online: [ n2 ]
   * OFFLINE: [ n1 n3 ]
Full List of Resources:
   * No resources
[...]
(there are resources configured on #1 and #3)

pcs cluster cib scope=nodes

   
   
   


Help me diagnose it please. Where should I look for the problem? (I have 
already tried a few things more - I see nothing helpful in log files, 
pcs --debug shows nothing suspicious, tried even editing the CIB manually)


Best regards,

Piotr Szafarczyk


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.3 released

2022-06-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.3.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.11.3.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.11.3.zip

The most notable feature of this release is exporting resources
configuration in for of pcs commands which can be used to recreate the
same configuration.

Complete change log for this release:
## [0.11.3] - 2022-06-23

### Security
- CVE-2022-1049: Pcs daemon was allowing expired accounts, and accounts
  with expired passwords to login when using PAM auth. ([huntr#220307],
  [rhbz#2068457])
- Pcsd does not expose the server name in HTTP headers anymore
  ([rhbz#2059122])
- Set `Strict-Transport-Security: max-age=63072000` HTTP header for all
  responses ([rhbz#2097731])
- Set HTTP headers to prevent caching everything except static files
  ([rhbz#2097733])
- Set HTTP headers to prevent sending referrer ([rhbz#2097732])
- Set cookie option SameSite to Lax ([rhbz#2097730])
- Set `Content-Security-Policy: frame-ancestors 'self'; default-src
  'self'` HTTP header for all responses ([rhbz#2097778])

### Added
- Add support for fence_mpath to `pcs stonith update-scsi-devices`
  command ([rhbz#2024522])
- Support for cluster UUIDs. New clusters now get a UUID during setup.
  Existing clusters can get a UUID by running the new `pcs cluster
  config uuid generate` command ([rhbz#2054671])
- Add warning regarding move constraints to `pcs status`
  ([rhbz#2058247])
- Support for output formats `json` and `cmd` to `pcs resource config`
  and `pcs stonith config` commands ([rhbz#2058251], [rhbz#2058252])

### Fixed
- Booth ticket name validation ([rhbz#2053177])
- Adding booth ticket doesn't report 'mode' as an unknown option anymore
  ([rhbz#2058243])
- Preventing fence-loop caused when stonith-watchdog-timeout is set with
  wrong value ([rhbz#2058246])
- Do not allow to create an order constraint for resources in one group
  as that may block Pacemaker ([ghpull#509])
- `pcs quorum device remove` works again ([rhbz#2095695])
- Fixed description of full permission ([rhbz#2059177])


Thanks / congratulations to everyone who contributed to this release,
including Alessandro Barbieri, Ivan Devat, lixin, Michal Pospisil,
Miroslav Lisik, Ondrej Mular, Tomas Jelinek and ysf.

Cheers,
Tomas


[ghpull#509]: https://github.com/ClusterLabs/pcs/pull/509
[rhbz#2024522]: https://bugzilla.redhat.com/show_bug.cgi?id=2024522
[rhbz#2053177]: https://bugzilla.redhat.com/show_bug.cgi?id=2053177
[rhbz#2054671]: https://bugzilla.redhat.com/show_bug.cgi?id=2054671
[rhbz#2058243]: https://bugzilla.redhat.com/show_bug.cgi?id=2058243
[rhbz#2058246]: https://bugzilla.redhat.com/show_bug.cgi?id=2058246
[rhbz#2058247]: https://bugzilla.redhat.com/show_bug.cgi?id=2058247
[rhbz#2058251]: https://bugzilla.redhat.com/show_bug.cgi?id=2058251
[rhbz#2058252]: https://bugzilla.redhat.com/show_bug.cgi?id=2058252
[rhbz#2059122]: https://bugzilla.redhat.com/show_bug.cgi?id=2059122
[rhbz#2059177]: https://bugzilla.redhat.com/show_bug.cgi?id=2059177
[rhbz#2068457]: https://bugzilla.redhat.com/show_bug.cgi?id=2068457
[rhbz#2095695]: https://bugzilla.redhat.com/show_bug.cgi?id=2095695
[rhbz#2097730]: https://bugzilla.redhat.com/show_bug.cgi?id=2097730
[rhbz#2097731]: https://bugzilla.redhat.com/show_bug.cgi?id=2097731
[rhbz#2097732]: https://bugzilla.redhat.com/show_bug.cgi?id=2097732
[rhbz#2097733]: https://bugzilla.redhat.com/show_bug.cgi?id=2097733
[rhbz#2097778]: https://bugzilla.redhat.com/show_bug.cgi?id=2097778
[huntr#220307]: 
https://huntr.dev/bounties/7aa921fc-a568-4fd8-96f4-7cd826246aa5/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.14 released

2022-06-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.14.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.14.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.14.zip

The most notable feature of this release is exporting resources
configuration in for of pcs commands which can be used to recreate the
same configuration.

Complete change log for this release:
## [0.10.14] - 2022-06-23

### Security
- CVE-2022-1049: Pcs daemon was allowing expired accounts, and accounts
  with expired passwords to login when using PAM auth. ([huntr#220307],
  [rhbz#2068456])
- Pcsd does not expose the server name in HTTP headers anymore
  ([rhbz#2058278])
- Set `Strict-Transport-Security: max-age=63072000` HTTP header for all
  responses ([rhbz#2097392])
- Set HTTP headers to prevent caching everything except static files
  ([rhbz#2097383])
- Set HTTP headers to prevent sending referrer ([rhbz#2097391])
- Set cookie option SameSite to Lax ([rhbz#2097393])

### Added
- Add support for fence_mpath to `pcs stonith update-scsi-devices`
  command ([rhbz#2023845])
- Support for cluster UUIDs. New clusters now get a UUID during setup.
  Existing clusters can get a UUID by running the new `pcs cluster
  config uuid generate` command ([rhbz#1950551])
- Add warning regarding move constraints to `pcs status`
  ([rhbz#1730232])
- Support for output formats `json` and `cmd` to `pcs resource config`
  and `pcs stonith config` commands ([rhbz#1874624], [rhbz#1909904])

### Fixed
- Agents not conforming to OCF standard are processed as if they
  conformed to OCF 1.0 - in the same way as before pcs-0.10.12
  ([rhbz#2050274])
- OCF 1.0 agents not conforming to the schema are processed anyway
  ([rhbz#2050274])
- Booth ticket name validation ([rhbz#1791661])
- Adding booth ticket doesn't report 'mode' as an unknown option anymore
  ([rhbz#1786964])
- Preventing fence-loop caused when stonith-watchdog-timeout is set with
  wrong value ([rhbz#1954099])
- Do not allow to create an order constraint for resources in one group
  as that may block Pacemaker ([ghpull#509])

### Deprecated
- Agents not complying with OCF 1.0 schema are processed,
  incompatibilities are listed as warnings. In pcs-0.11, they will be
  reported as errors and prevent pcs from working with such agents.
  ([rhbz#2050274])


Thanks / congratulations to everyone who contributed to this release,
including Alessandro Barbieri, Ivan Devat, lixin, Michal Pospisil,
Miroslav Lisik, Ondrej Mular, Tomas Jelinek and ysf.

Cheers,
Tomas


[ghpull#509]: https://github.com/ClusterLabs/pcs/pull/509
[rhbz#1730232]: https://bugzilla.redhat.com/show_bug.cgi?id=1730232
[rhbz#1786964]: https://bugzilla.redhat.com/show_bug.cgi?id=1786964
[rhbz#1791661]: https://bugzilla.redhat.com/show_bug.cgi?id=1791661
[rhbz#1874624]: https://bugzilla.redhat.com/show_bug.cgi?id=1874624
[rhbz#1909904]: https://bugzilla.redhat.com/show_bug.cgi?id=1909904
[rhbz#1950551]: https://bugzilla.redhat.com/show_bug.cgi?id=1950551
[rhbz#1954099]: https://bugzilla.redhat.com/show_bug.cgi?id=1954099
[rhbz#2023845]: https://bugzilla.redhat.com/show_bug.cgi?id=2023845
[rhbz#2050274]: https://bugzilla.redhat.com/show_bug.cgi?id=2050274
[rhbz#2058278]: https://bugzilla.redhat.com/show_bug.cgi?id=2058278
[rhbz#2068456]: https://bugzilla.redhat.com/show_bug.cgi?id=2068456
[rhbz#2097383]: https://bugzilla.redhat.com/show_bug.cgi?id=2097383
[rhbz#2097391]: https://bugzilla.redhat.com/show_bug.cgi?id=2097391
[rhbz#2097392]: https://bugzilla.redhat.com/show_bug.cgi?id=2097392
[rhbz#2097393]: https://bugzilla.redhat.com/show_bug.cgi?id=2097393
[huntr#220307]: 
https://huntr.dev/bounties/7aa921fc-a568-4fd8-96f4-7cd826246aa5/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] No node name in corosync-cmapctl output

2022-05-31 Thread Tomas Jelinek

Dne 31. 05. 22 v 15:34 Jan Friesse napsal(a):

Hi,

On 31/05/2022 15:16, Andreas Hasenack wrote:

Hi,

corosync 3.1.6
pacemaker 2.1.2
crmsh 4.3.1

TL;DR
I only seem to get a "name" attribute in the "corosync-cmapctl | grep
nodelist" output if I set an explicit name in corosync.conf's
nodelist. If I rely on the default of "name will be uname -n if it's
not set", I get nothing.



wondering where is problem? name is not set so it's not in cmap, what is 
(mostly) 1:1 mapping of config file. So this is expected, not a bug.


If I remember correctly, this "name will be uname -n if it's not set" is 
done in pacemaker. It is expected such name is not present in cmapctl 
output, as Honza said.


Regards,
Tomas



I formed a test cluster of 3 nodes, and I'm not setting the name
attribute in the nodelist, so that it defaults to `uname -n`:
nodelist {
 node {
  nodeid: 1
 ring0_addr: k1
 }
 node {
 nodeid: 2
 ring0_addr: k2
 }
 node {
 nodeid: 3
 ring0_addr: k3
 }
}

The addresses "k1", "k2" and "k3" are fully resolvable (I know IPs are
better, but for this quick test it was simpler to use the hostnames).

crm status is happy:
root@k1:~# crm status
Cluster Summary:
   * Stack: corosync
   * Current DC: k3 (version 2.1.2-ada5c3b36e2) - partition with quorum
   * Last updated: Tue May 31 12:53:02 2022
   * Last change:  Tue May 31 12:51:55 2022 by hacluster via crmd on k3
   * 3 nodes configured
   * 0 resource instances configured

Node List:
   * Online: [ k1 k2 k3 ]

Full List of Resources:
   * No resources


But there is no node name in the corosync-cmapctl output:

root@k1:~# corosync-cmapctl |grep nodelist
nodelist.local_node_pos (u32) = 0
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.ring0_addr (str) = k1
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.ring0_addr (str) = k2
nodelist.node.2.nodeid (u32) = 3
nodelist.node.2.ring0_addr (str) = k3

I was expecting to have entries like "nodelist.node.0.name = k1" in
that output. Apparently I only get that if I explicitly set a node
name in nodelist.

For example, if I set the name of nodeid 1 to "explicit1":
 node {
 name: explicit1
 nodeid: 1
 ring0_addr: k1
 }

Then I get the name attribute for that nodeid only:
# corosync-cmapctl |grep nodelist
nodelist.local_node_pos (u32) = 0
nodelist.node.0.name (str) = explicit1
nodelist.node.0.nodeid (u32) = 1
nodelist.node.0.ring0_addr (str) = k1
nodelist.node.1.nodeid (u32) = 2
nodelist.node.1.ring0_addr (str) = k2
nodelist.node.2.nodeid (u32) = 3
nodelist.node.2.ring0_addr (str) = k3

Why not also use "uname -n" when "name" is not explicitly set in the
corosync nodelist config?


Can you please share use case for this behavior? It shouldn't be be hard 
to implement.


Regards,
   Honza


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Tomas Jelinek

Dne 21. 04. 22 v 17:26 john tillman napsal(a):

Dne 20. 04. 22 v 20:21 john tillman napsal(a):

On 20.04.2022 19:53, john tillman wrote:

I have a two node cluster that won't start any resources if only one
node
is booted; the pacemaker service does not start.

Once the second node boots up, the first node will start pacemaker and
the
resources are started.  All is well.  But I would like the resources
to
start when the first node boots by itself.

I thought the problem was with the wait_for_all option but I have it
set
to "0".

On the node that is booted by itself, when I run "corosync-quorumtool"
I
see:

 [root@test00 ~]# corosync-quorumtool
 Quorum information
 --
 Date: Wed Apr 20 16:05:07 2022
 Quorum provider:  corosync_votequorum
 Nodes:1
 Node ID:  1
 Ring ID:  1.2f
 Quorate:  Yes

 Votequorum information
 --
 Expected votes:   2
 Highest expected: 2
 Total votes:  1
 Quorum:   1
 Flags:2Node Quorate

 Membership information
 --
 Nodeid  Votes Name
  1  1 test00 (local)


My config file look like this:
 totem {
 version: 2
 cluster_name: testha
 transport: knet
 crypto_cipher: aes256
 crypto_hash: sha256
 }

 nodelist {
 node {
 ring0_addr: test00
 name: test00
 nodeid: 1
 }

 node {
 ring0_addr: test01
 name: test01
 nodeid: 2
 }
 }

 quorum {
 provider: corosync_votequorum
 two_node: 1
 wait_for_all: 0
 }

 logging {
 to_logfile: yes
 logfile: /var/log/cluster/corosync.log
 to_syslog: yes
 timestamp: on
 debug: on
 syslog_priority: debug
 logfile_priority: debug
 }

Fencing is disabled.



That won't work.


I've also looked in "corosync.log" but I don't know what to look for
to
diagnose this issue.  I mean there are many lines similar to:
[QUORUM] This node is within the primary component and will provide
service.
and
[VOTEQ ] Sending quorum callback, quorate = 1
and
[VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No

Is there something specific I should look for in the log?

So can a two node cluster work after booting only one node?  Maybe it
never will and I am wasting a lot of time, yours and mine.

If it can, what else can I investigate further?



Before node can start handling resources it needs to know status of
other node. Without successful fencing there is no way to accomplish
it.

Yes, you can tell pacemaker to ignore unknown status. Depending on your
resources this could simply prevent normal work or lead to data
corruption.



Makes sense.  Thank you.

Perhaps some future enhancement could allow for this situation?  I mean,
It might be desirable for some cases to allow for a single node to boot,
determine quorum by two_node=1 and wait_for_all=0, and start resources
without ever seeing the other node.  Sure, there are dangers of split
brain but I can see special cases where I want the node to work alone
for
a period of time despite the danger.



Hi John,

How about 'pcs quorum unblock'?

Regards,
Tomas




Tomas,

Thank you for the suggestion.  However it didn't work.  It returned:
Error: unable to check quorum status
   crm_mon: Error: cluster is not available on this node
I checked pacemaker, just in case, and it still isn't running.



John,

As discussed in other branches of this thread, you need to figure out 
why pacemaker is not starting. Even if one node is not running, corosync 
and pacemaker are expected to be able to start on the other node. Then, 
once you unblock quorum on the running node, it will start cluster 
resources.



I very curious how I could convince the cluster to start its resources on
one node in the event that the other node is not able to boot.  But I'm


Figure out why pacemaker is not starting.


afraid the answer is either to use fencing or add a third node to the
cluster or both.


Well, fencing is absolutely needed in any case.


Tomas



-John



Thank you again.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/





___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/





___
Manage your subscription:

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-21 Thread Tomas Jelinek

Dne 20. 04. 22 v 20:21 john tillman napsal(a):

On 20.04.2022 19:53, john tillman wrote:

I have a two node cluster that won't start any resources if only one
node
is booted; the pacemaker service does not start.

Once the second node boots up, the first node will start pacemaker and
the
resources are started.  All is well.  But I would like the resources to
start when the first node boots by itself.

I thought the problem was with the wait_for_all option but I have it set
to "0".

On the node that is booted by itself, when I run "corosync-quorumtool" I
see:

[root@test00 ~]# corosync-quorumtool
Quorum information
--
Date: Wed Apr 20 16:05:07 2022
Quorum provider:  corosync_votequorum
Nodes:1
Node ID:  1
Ring ID:  1.2f
Quorate:  Yes

Votequorum information
--
Expected votes:   2
Highest expected: 2
Total votes:  1
Quorum:   1
Flags:2Node Quorate

Membership information
--
Nodeid  Votes Name
 1  1 test00 (local)


My config file look like this:
totem {
version: 2
cluster_name: testha
transport: knet
crypto_cipher: aes256
crypto_hash: sha256
}

nodelist {
node {
ring0_addr: test00
name: test00
nodeid: 1
}

node {
ring0_addr: test01
name: test01
nodeid: 2
}
}

quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 0
}

logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
debug: on
syslog_priority: debug
logfile_priority: debug
}

Fencing is disabled.



That won't work.


I've also looked in "corosync.log" but I don't know what to look for to
diagnose this issue.  I mean there are many lines similar to:
[QUORUM] This node is within the primary component and will provide
service.
and
[VOTEQ ] Sending quorum callback, quorate = 1
and
[VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No

Is there something specific I should look for in the log?

So can a two node cluster work after booting only one node?  Maybe it
never will and I am wasting a lot of time, yours and mine.

If it can, what else can I investigate further?



Before node can start handling resources it needs to know status of
other node. Without successful fencing there is no way to accomplish it.

Yes, you can tell pacemaker to ignore unknown status. Depending on your
resources this could simply prevent normal work or lead to data
corruption.



Makes sense.  Thank you.

Perhaps some future enhancement could allow for this situation?  I mean,
It might be desirable for some cases to allow for a single node to boot,
determine quorum by two_node=1 and wait_for_all=0, and start resources
without ever seeing the other node.  Sure, there are dangers of split
brain but I can see special cases where I want the node to work alone for
a period of time despite the danger.



Hi John,

How about 'pcs quorum unblock'?

Regards,
Tomas


Thank you again.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/





___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs cluster auth with a key instead of password #179

2022-03-30 Thread Tomas Jelinek

Hi,

This issue is not planned to be worked on in the near future. It is 
still on our todo list. However, as of now, I cannot give you any 
estimate of when the feature would land in pcs.


In any case, pcs-0.9 is no longer maintained and therefore the feature 
will not be added there.


Regards,
Tomas


Dne 29. 03. 22 v 13:04 S Sathish S via Users napsal(a):

Hi Team,

We are using hacluster to perform pcs cluster auth & for that user we 
doesn’t set password expiry to avoid any impact PCS functionality.


But as per security best practice its not recommended to set password 
never expire for any OS user account , so we are planning to change pcs 
cluster auth with a key instead of password and then see know limitation 
in Clusterlab. Can we know when we can expect the fix for the below open 
issue.


pcs cluster auth with a key instead of password · Issue #179 · 
ClusterLabs/pcs · GitHub 


[root@node01 ]# rpm -qa | grep -Ei 'pcs|pacemaker|corosync'

pacemaker-2.0.2-2.el7.x86_64

corosync-2.4.4-2.el7.x86_64

pcs-0.9.169-1.el7.x86_64

[root@node01 ]#

Thanks and Regards,
S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.12 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.12.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.12.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.12.zip

The main feature added in this release is support for OCF 1.1 agents.


Complete change log for this release:
## [0.10.12] - 2021-11-30

### Added
- Option `--autodelete` of command `pcs resource move` is fully
  supported ([rhbz#1990784])
- Support for OCF 1.1 resource and stonith agents ([rhbz#2018969])

### Fixed
- Do not show warning that no stonith device was detected and
  stonith-enabled is not false when a stonith device is in a group
  ([ghpull#370])
- Misleading error message from `pcs quorum unblock` when
  `wait_for_all=0` ([rhbz#1968088])
- Misleading error message from `pcs booth setup` and `pcs booth pull`
  when booth config directory (`/etc/booth`) is missing ([rhbz#1791670],
  [ghpull#411], [ghissue#225])


Thanks / congratulations to everyone who contributed to this release,
including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Miroslav Lisik,
Ondrej Mular, Tomas Jelinek and vivi.

Cheers,
Tomas


[ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225
[ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370
[ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411
[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088
[rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784
[rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.2 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.2.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.2.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.2.zip

Complete change log for this release:
## [0.11.2] - 2022-02-01

### Fixed
- Pcs was not automatically enabling corosync-qdevice when adding a
  quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902])
- `resource update` command exiting with a traceback when updating a
  resource with a non-existing resource agent ([rhbz#2019836])
- pcs\_snmp\_agent is working again (broken since pcs-0.10.1)
  ([ghpull#431])
- Skip checking of scsi devices to be removed before unfencing to be
  added devices ([rhbz#2033248])
- Make `ocf:linbit:drbd` agent pass OCF standard validation
  ([ghissue#441], [rhbz#2036633])
- Multiple improvements of `pcs resource move` command ([rhbz#1996062])
- Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running
  `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463])

### Deprecated
- Usage of `pcs resource` commands for stonith resources and vice versa
  ([rhbz#1301204])


Thanks / congratulations to everyone who contributed to this release,
including Fabio M. Di Nitto, Miroslav Lisik, Ondrej Mular, Tomas Jelinek
and Valentin Vidić.

Cheers,
Tomas


[ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441
[ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431
[rhbz#1301204]: https://bugzilla.redhat.com/show_bug.cgi?id=1301204
[rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062
[rhbz#2019836]: https://bugzilla.redhat.com/show_bug.cgi?id=2019836
[rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463
[rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902
[rhbz#2033248]: https://bugzilla.redhat.com/show_bug.cgi?id=2033248
[rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.13 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.13.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.10.13.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.10.13.zip


Complete change log for this release:
## [0.10.13] - 2022-01-31

### Fixed
- Pcs was not automatically enabling corosync-qdevice when adding a
  quorum device to a cluster (broken since pcs-0.10.9) ([rhbz#2028902])
- `resource update` command exiting with a traceback when updating a
  resource with a non-existing resource agent ([rhbz#1384485])
- pcs\_snmp\_agent is working again (broken since pcs-0.10.1)
  ([ghpull#431])
- Skip checking of scsi devices to be removed before unfencing to be
  added devices ([rhbz#2032997])
- Make `ocf:linbit:drbd` agent pass OCF standard validation
  ([ghissue#441], [rhbz#2036633])
- Multiple improvements of `pcs resource move --autodelete` command
  ([rhbz#1990784])
- Pcs no longer creates Pacemaker-1.x CIB when `-f` is used, so running
  `pcs cluster cib-upgrade` manually is not needed ([rhbz#2022463])


Thanks / congratulations to everyone who contributed to this release,
including Miroslav Lisik, Ondrej Mular, Tomas Jelinek and Valentin
Vidić.

Cheers,
Tomas


[ghissue#441]: https://github.com/ClusterLabs/pcs/issues/441
[ghpull#431]: https://github.com/ClusterLabs/pcs/pull/431
[rhbz#1384485]: https://bugzilla.redhat.com/show_bug.cgi?id=1384485
[rhbz#1990784]: https://bugzilla.redhat.com/show_bug.cgi?id=1990784
[rhbz#2022463]: https://bugzilla.redhat.com/show_bug.cgi?id=2022463
[rhbz#2028902]: https://bugzilla.redhat.com/show_bug.cgi?id=2028902
[rhbz#2032997]: https://bugzilla.redhat.com/show_bug.cgi?id=2032997
[rhbz#2036633]: https://bugzilla.redhat.com/show_bug.cgi?id=2036633

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.11.1 released

2022-03-24 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.11.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/v0.11.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/v0.11.1.zip

This is the first release of the pcs-0.11 branch. The branch fully
supports clusters running Corosync 3.x and Pacemaker 2.1.x. In the
meantime, we are still providing pcs-0.10 branch, supporting Pacemaker
2.0.x and 2.1.x compiled with '--enable-compat-2.0' option.

The most important changes for this release:
* Support for OCF 1.1 agents
* Manually moving resources without leaving location constraints behind
* Errors, warnings and progress related output are now printed to stderr
  instead of stdout
* Role names "Promoted" and "Unpromoted" are preferred. Legacy names
  "Master" and "Slave" still work, but they are deprecated.


Complete change log for this release:
## [0.11.1] - 2021-11-30

### Removed
- Deprecated obsolete commands `pcs config import-cman` and `pcs config
  export pcs-commands|pcs-commands-verbose` have been removed
  ([rhbz#1881064])
- Unused and unmaintained pcsd urls: `/remote/config_backup`,
  `/remote/node_available`, `/remote/resource_status`
- Pcsd no longer provides data in format used by web UI in pcs 0.9.142
  and older

### Added
- Explicit confirmation is now required to prevent accidental destroying
  of the cluster with `pcs cluster destroy` ([rhbz#1283805])
-  Add add/remove cli syntax for command `pcs stonith
   update-scsi-devices` ([rhbz#1992668])
- Command `pcs resource move` is fully supported ([rhbz#1990787])
- Support for OCF 1.1 resource and stonith agents ([rhbz#2018969])

### Changed
- Pcs no longer depends on python3-distro package
- 'pcs status xml' now prints cluster status in the new format provided
  by Pacemaker 2.1 ([rhbz#1985981])
- All errors, warning and progress related output is now printed to
  stderr instead of stdout
- Make roles `Promoted` and `Unpromoted` default ([rhbz#1885293])
- Make auto-deleting constraint default for `pcs resource move` command
  ([rhbz#1996062])
- Deprecation warnings use a "Deprecation Warning:" prefix instead of
  "Warning:" on the command line
- Minimal required version of python has been changed to 3.9
- Minimal required version of ruby has been changed to 2.5
- Minimal supported version of pacemaker is 2.1

### Fixed
- Do not unfence newly added devices on fenced cluster nodes
  ([rhbz#1991654])
- Fix displaying fencing levels with regular expression targets
  ([rhbz#1533090])
- Reject cloning of stonith resources ([rhbz#1811072])
- Do not show warning that no stonith device was detected and
  stonith-enabled is not false when a stonith device is in a group
  ([ghpull#370])
- Misleading error message from `pcs quorum unblock` when
  `wait_for_all=0` ([rhbz#1968088])
- Misleading error message from `pcs booth setup` and `pcs booth pull`
  when booth config directory (`/etc/booth`) is missing ([rhbz#1791670],
  [ghpull#411], [ghissue#225])

### Deprecated
- Legacy role names `Master` and `Slave` ([rhbz#1885293])
- Option `--master` is deprecated and has been replaced by option
  `--promoted` ([rhbz#1885293])


Thanks / congratulations to everyone who contributed to this release,
including Hideo Yamauchi, Ivan Devat, Michal Pospisil, Michele
Baldessari, Miroslav Lisik, Ondrej Mular, Tomas Jelinek and vivi.

Cheers,
Tomas


[ghissue#225]: https://github.com/ClusterLabs/pcs/issues/225
[ghpull#370]: https://github.com/ClusterLabs/pcs/pull/370
[ghpull#411]: https://github.com/ClusterLabs/pcs/pull/411
[rhbz#1283805]: https://bugzilla.redhat.com/show_bug.cgi?id=1283805
[rhbz#1533090]: https://bugzilla.redhat.com/show_bug.cgi?id=1533090
[rhbz#1791670]: https://bugzilla.redhat.com/show_bug.cgi?id=1791670
[rhbz#1811072]: https://bugzilla.redhat.com/show_bug.cgi?id=1811072
[rhbz#1881064]: https://bugzilla.redhat.com/show_bug.cgi?id=1881064
[rhbz#1885293]: https://bugzilla.redhat.com/show_bug.cgi?id=1885293
[rhbz#1968088]: https://bugzilla.redhat.com/show_bug.cgi?id=1968088
[rhbz#1985981]: https://bugzilla.redhat.com/show_bug.cgi?id=1985981
[rhbz#1990787]: https://bugzilla.redhat.com/show_bug.cgi?id=1990787
[rhbz#1991654]: https://bugzilla.redhat.com/show_bug.cgi?id=1991654
[rhbz#1992668]: https://bugzilla.redhat.com/show_bug.cgi?id=1992668
[rhbz#1996062]: https://bugzilla.redhat.com/show_bug.cgi?id=1996062
[rhbz#2018969]: https://bugzilla.redhat.com/show_bug.cgi?id=2018969

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PAF with postgresql 13?

2022-03-09 Thread Tomas Jelinek

Dne 08. 03. 22 v 23:08 Ken Gaillot napsal(a):

On Tue, 2022-03-08 at 17:20 +0100, Jehan-Guillaume de Rorthais wrote:

Hi,

Sorry, your mail was really hard to read on my side, but I think I
understood
and try to answer bellow.

On Tue, 8 Mar 2022 11:45:30 +
lejeczek via Users  wrote:


On 08/03/2022 10:21, Jehan-Guillaume de Rorthais wrote:

op start timeout=60s \ op stop timeout=60s \ op promote
timeout=30s  >> \
op demote timeout=120s \ op monitor interval=15s

timeout=10s >> role="Master" meta master-max=1 \ op monitor
interval=16s >> timeout=10s role="Slave" \ op notify
timeout=60s meta notify=true > Because "op" appears, we are
back in resource ("pgsqld") context, > anything after is
interpreted as ressource and operation attributes, > even
the "meta notify=true". That's why your pgsqld-clone doesn't
  > have the meta attribute "notify=true" set.
Here is one-liner that should do - add, as per 'debug-'
suggestion, 'master-max=1'


What debug- suggestion??

...

then do:

-> $ pcs resource delete pgsqld

'-clone' should get removed too, so now no 'pgsqld'
resource(s) but cluster - weirdly in my mind - leaves node
attributes on.


indeed.


I see 'master-pgsqld' with each node and do not see why
'node attributes' should be kept(certainly shown) for
non-existent resources(to which only resources those attrs
are instinct)
So, you want to "clean" that for, perhaps for now you are
not going to have/use 'pgsqlms', you can do that with:

-> $ pcs node attribute node1 master-pgsqld="" # same for
remaining nodes


indeed.


now .. ! repeat your one-liner which worked just a moment
ago and you should get exact same or similar errors(while
all nodes are stuck on 'slave'


You have no promotion because your PostgreSQL instances has been
stopped
in standby mode. The cluster has no way and no score to promote one
of them.


-> $ pcs resource debug-promote pgsqld
crm_resource: Error performing operation: Error occurred
Operation force-promote for pgsqld (ocf:heartbeat:pgsqlms)
returned 1 (error: Can not get current node LSN location)
/tmp:5432 - accepting connections


NEVER use "debug-promote" or other "debug-*" command with pgsqlms, or
any other
cloned ressources. AFAIK, these commands works fine for "stateless"
ressource,
but do not (could not) create the required environnement for the
clone and multi-state ones.

So I repeat, NEVER use "debug-promote".

What you want to do is setting the promotion score on the node you
want the
promotion to happen. Eg.:

   pcs node attribute srv1 master-pgsqld=1001

You can use "crm_attribute" or "crm_master" as well.


ocf-exit-reason:Can not get current node LSN location


This one is probably because of "debug-promote".


You have to 'cib-push' to "fix" this very problem.
In my(admin's) opinion this is a 100% candidate for a bug -
whether in PCS or PAF - perhaps authors may wish to comment?


Removing the node attributes with the resource might be legit from
the
Pacemaker point of view, but I'm not sure how they can track the
dependency
(ping Ken?).


Higher-level tools like pcs or crm shell could probably do it when
removing the resource (i.e. if the resource was a promotable clone,
check for and remove any node attributes of the form master-$RSC_ID).
That sounds like a good idea to me.


I put this on pcs todo list.

Regards,
Tomas



Pacemaker would be a bad place to do it because Pacemaker only sees the
newly modified CIB with the resource configuration gone -- it can't
know for sure whether it was a promotable clone, and it can only know
it existed at all if there is leftover status entries (causing the
resource to be listed as "orphaned"), which isn't guaranteed.



PAF has no way to know the ressource is being deleted and can not
remove its
node attribute before hand.

Maybe PCS can look for promotable score and remove them during the
"resource
delete" command (ping Tomas)?

Regards,



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Removing a resource without stopping it

2022-01-31 Thread Tomas Jelinek

Hi,

The output you posted actually shows the procedure you followed works. 
Orphan resources are running resources which have no configuration 
stored in CIB. Usually, they are stopped shortly after they are removed 
from CIB. If you set stop-orphan-resources to false, pacemaker won't 
stop them.


To summarize:
1) Set stop-orphan-resources to false. This prevents pacemaker from 
stopping a resource once it is removed from CIB.
2) Run `pcs resource delete srv01-cs8 --force`. Using --force makes pcs 
not trying to stop the resource. Pcs directly removes the resource from CIB.
3) Confirm by running `pcs status`. If you see the resource is orphaned, 
it means the resource is still running after it has been removed from 
CIB and the only trace of it is in pacemaker state - pacemaker remembers 
it had started a resource which is now not present in CIB.


In this stage, when you flip stop-orphan-resources to true, pacemaker 
stops the resource as there is no record of it in the CIB anymore and 
therefore no reason for it to be running. Once stopped, the resource is 
removed from pacemaker state as well.


Not sure if this helps you with the actual migration, but hopefully it 
at least clarified a bit what's going on.



Regards,
Tomas


Dne 29. 01. 22 v 6:12 Digimer napsal(a):

On 2022-01-29 00:10, Digimer wrote:

On 2022-01-28 16:54, Ken Gaillot wrote:

On Fri, 2022-01-28 at 16:38 -0500, Digimer wrote:

Hi all,

I'm trying to figure out how to move a running VM from one
pacemaker
cluster to another. I've got the storage and VM live migration
sorted,
but having trouble with pacemaker.

I tried unmanaging the resource (the VM), then deleted the
resource,
and the node got fenced. So I am assuming it thought it couldn't
stop
the service so it self-fenced. In any case, can someone let me know
what
the proper procedure is?

Said more directly;

How to I delete a resource from pacemaker (via pcs on EL8)
without
stopping the resource?


Set the stop-orphan-resources cluster property to false (at least while
you move it)

The problem with your first approach is that once you remove the
resource configuration, which includes the is-managed setting,
Pacemaker no longer knows the resource is unmanaged. And even if you
set it via resource defaults or something, eventually you have to set
it back, at which point Pacemaker will still have the same response.


Follow up;

  I tried to do the following sequence;


pcs property set stop-orphan-resources=false
pcs resource unmanage srv01-cs8             # Without this, the 
resource was stopped
pcs resource delete srv01-cs8                   # Failed with 
"Warning: 'srv01-cs8' is unmanaged"
pcs resource delete srv01-cs8 --force   # Got 'Deleting 
Resource - srv01-cs8'

pcs resource status
--
  * srv01-cs8   (ocf::alteeve:server):   ORPHANED Started an-a01n01 
(unmanaged)

--


  So it seems like this doesn't delete the resource. Can I get some 
insight on how to actually delete this resource without disabling the VM?


Thanks!


Adding;

I tried 'pcs property set stop-orphan-resources=true' and it stopped the 
VM and then actually deleted the resource. =/


--
Digimer
Papers and Projects:https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain 
than in the near certainty that people of equal talent have lived and died in cotton 
fields and sweatshops." - Stephen Jay Gould


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs stonith update problems

2021-11-30 Thread Tomas Jelinek

Dne 21. 07. 21 v 19:06 Digimer napsal(a):

On 2021-07-21 8:19 a.m., Tomas Jelinek wrote:

Dne 16. 07. 21 v 16:30 Digimer napsal(a):

On 2021-07-16 9:26 a.m., Tomas Jelinek wrote:

Dne 16. 07. 21 v 6:35 Andrei Borzenkov napsal(a):

On 16.07.2021 01:02, Digimer wrote:

Hi all,

     I've got a predicament... I want to update a stonith resource to
remove an argument. Specifically, when resource move nodes, I want to
change the stonith delay to favour the new host. This involves adding
the 'delay="x"' argument to one stonith resource, and removing it from
the other;

Example;


# pcs cluster cib | grep -B7 -A7 '"delay"'
     
   
     
     
     
     
     
   
   
     
   
     


Here, the stonith resource 'ipmilan_node1' has the delay="15".

If I run:


# pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1"
password="xxx" username="admin"; echo $?
0


I see nothing happen in journald, and the delay argument remains in
the
'pcs cluster cib' output. If, however, I do;


# /usr/sbin/pcs stonith update ipmilan_node1 fence_ipmilan
ipaddr="10.201.17.1" password="xxx" username="admin" delay="0";
echo $?
0


I can see in journald that the CIB was updated and can confirm in 'pcs
cluster cib' that the 'delay' value becomes '0'. So it seems that,
if an
argument previously existed and is NOT specified in an update, it
is not
removed.

Is this intentional for some reason? If so, how would I remove the
delay
attribute?


Yes, this is intentional. As far as I remember, update commands in pcs
have always worked this way:
* do not change attributes not specified in the command
* if an attribute is specified with an empty value, remove the attribute
from cluster configuration
* else set specified value of the specified attribute in cluster
configuration

This means you only need to specify attributes you want to change. You
don't need to bother with attributes you want to keep unchanged.

If you want to delete the delay attribute, you can do it like this:
pcs stonith update ipmilan_node1 delay=
This will remove delay and keep all the other attributes unchanged.

I'm not sure why this principle is not documented in pcs man page. We
can fix that, though.

Note that specifying a stonith agent in the update command does nothing
(which is expected) and is silently ignored by pcs (which is a bug).

Regards,
Tomas


Ah, thank you (and as Andrei)!

Is this behaviour not documented in 'pcs stonith help' -> update? Or was
I blind and missed it?



I think this isn't documented anywhere in pcs. It may be described in
Clusters from Scratch or a similar document. I think I've read it
somewhere, but I'm not sure where it was.

Anyway, I put it on our todo list to get it explained in pcs help and
man page.


Thanks,
Tomas


Thanks!



Well, it took four months, but it's upstream now. Hopefully this 
clarifies it

https://github.com/ClusterLabs/pcs/commit/54c4db0fdbb9caf3fbc28c2941cce89641992037
https://github.com/ClusterLabs/pcs/commit/3db16404c6b1a4fddcba7b9628c78dbad6d68d33

The fix will be included in the next versions of both supported branches 
(pcs-0.10 and pcs-0.11).


Regards,
Tomas

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.11 released

2021-10-06 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.11.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.11.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/refs/tags/v0.10.11.zip

This release contains two important bug fixes, one for creating
resources with 'depth' attribute, and one for 'pcs stonith
update-scsi-devices' command.


Complete change log for this release:
## [0.10.11] - 2021-10-05

### Added
-  Add add/remove cli syntax for command `pcs stonith
   update-scsi-devices` ([rhbz#1992668])

### Fixed
- Fixed an error when creating a resource which defines 'depth'
  attribute for its operations ([rhbz#1998454])
- Do not unfence newly added devices on fenced cluster nodes
  ([rhbz#1991654])
- Fix displaying fencing levels with regular expression targets
  ([rhbz#1533090])

## [0.10.10] - 2021-08-19

### Added
- Support for new role names introduced in pacemaker 2.1
  ([rhbz#1885293])

### Fixed
- Traceback in some cases when --wait without timeout is used


Thanks / congratulations to everyone who contributed to this release,
including Michal Pospisil, Miroslav Lisik, Ondrej Mular and Tomas
Jelinek.

Cheers,
Tomas


[rhbz#1533090]: https://bugzilla.redhat.com/show_bug.cgi?id=1533090
[rhbz#1885293]: https://bugzilla.redhat.com/show_bug.cgi?id=1885293
[rhbz#1991654]: https://bugzilla.redhat.com/show_bug.cgi?id=1991654
[rhbz#1992668]: https://bugzilla.redhat.com/show_bug.cgi?id=1992668
[rhbz#1998454]: https://bugzilla.redhat.com/show_bug.cgi?id=1998454

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs stonith update problems

2021-07-21 Thread Tomas Jelinek

Dne 16. 07. 21 v 16:30 Digimer napsal(a):

On 2021-07-16 9:26 a.m., Tomas Jelinek wrote:

Dne 16. 07. 21 v 6:35 Andrei Borzenkov napsal(a):

On 16.07.2021 01:02, Digimer wrote:

Hi all,

    I've got a predicament... I want to update a stonith resource to
remove an argument. Specifically, when resource move nodes, I want to
change the stonith delay to favour the new host. This involves adding
the 'delay="x"' argument to one stonith resource, and removing it from
the other;

Example;


# pcs cluster cib | grep -B7 -A7 '"delay"'
    
  
    
    
    
    
    
  
  
    
  
    


Here, the stonith resource 'ipmilan_node1' has the delay="15".

If I run:


# pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1"
password="xxx" username="admin"; echo $?
0


I see nothing happen in journald, and the delay argument remains in the
'pcs cluster cib' output. If, however, I do;


# /usr/sbin/pcs stonith update ipmilan_node1 fence_ipmilan
ipaddr="10.201.17.1" password="xxx" username="admin" delay="0"; echo $?
0


I can see in journald that the CIB was updated and can confirm in 'pcs
cluster cib' that the 'delay' value becomes '0'. So it seems that, if an
argument previously existed and is NOT specified in an update, it is not
removed.

Is this intentional for some reason? If so, how would I remove the delay
attribute?


Yes, this is intentional. As far as I remember, update commands in pcs
have always worked this way:
* do not change attributes not specified in the command
* if an attribute is specified with an empty value, remove the attribute
from cluster configuration
* else set specified value of the specified attribute in cluster
configuration

This means you only need to specify attributes you want to change. You
don't need to bother with attributes you want to keep unchanged.

If you want to delete the delay attribute, you can do it like this:
pcs stonith update ipmilan_node1 delay=
This will remove delay and keep all the other attributes unchanged.

I'm not sure why this principle is not documented in pcs man page. We
can fix that, though.

Note that specifying a stonith agent in the update command does nothing
(which is expected) and is silently ignored by pcs (which is a bug).

Regards,
Tomas


Ah, thank you (and as Andrei)!

Is this behaviour not documented in 'pcs stonith help' -> update? Or was
I blind and missed it?



I think this isn't documented anywhere in pcs. It may be described in 
Clusters from Scratch or a similar document. I think I've read it 
somewhere, but I'm not sure where it was.


Anyway, I put it on our todo list to get it explained in pcs help and 
man page.



Thanks,
Tomas

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs stonith update problems

2021-07-16 Thread Tomas Jelinek

Dne 16. 07. 21 v 6:35 Andrei Borzenkov napsal(a):

On 16.07.2021 01:02, Digimer wrote:

Hi all,

   I've got a predicament... I want to update a stonith resource to
remove an argument. Specifically, when resource move nodes, I want to
change the stonith delay to favour the new host. This involves adding
the 'delay="x"' argument to one stonith resource, and removing it from
the other;

Example;


# pcs cluster cib | grep -B7 -A7 '"delay"'
   
 
   
   
   
   
   
 
 
   
 
   


Here, the stonith resource 'ipmilan_node1' has the delay="15".

If I run:


# pcs stonith update ipmilan_node1 fence_ipmilan ipaddr="10.201.17.1"
password="xxx" username="admin"; echo $?
0


I see nothing happen in journald, and the delay argument remains in the
'pcs cluster cib' output. If, however, I do;


# /usr/sbin/pcs stonith update ipmilan_node1 fence_ipmilan
ipaddr="10.201.17.1" password="xxx" username="admin" delay="0"; echo $?
0


I can see in journald that the CIB was updated and can confirm in 'pcs
cluster cib' that the 'delay' value becomes '0'. So it seems that, if an
argument previously existed and is NOT specified in an update, it is not
removed.

Is this intentional for some reason? If so, how would I remove the delay
attribute?


Yes, this is intentional. As far as I remember, update commands in pcs 
have always worked this way:

* do not change attributes not specified in the command
* if an attribute is specified with an empty value, remove the attribute 
from cluster configuration
* else set specified value of the specified attribute in cluster 
configuration


This means you only need to specify attributes you want to change. You 
don't need to bother with attributes you want to keep unchanged.


If you want to delete the delay attribute, you can do it like this:
pcs stonith update ipmilan_node1 delay=
This will remove delay and keep all the other attributes unchanged.

I'm not sure why this principle is not documented in pcs man page. We 
can fix that, though.


Note that specifying a stonith agent in the update command does nothing 
(which is expected) and is silently ignored by pcs (which is a bug).


Regards,
Tomas



According to similar discussion a couple of days ago, you set attribute
to empty value: delay=""
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs update resource command not working

2021-07-13 Thread Tomas Jelinek

Dne 09. 07. 21 v 7:29 S Sathish S napsal(a):

Hi Team,

we have find the cause of this problem as per below changelog pcs 
resource update command doesn’t support empty meta_attributes anymore.


https://github.com/ClusterLabs/pcs/blob/0.9.169/CHANGELOG.md

pcs resource update does not create an empty meta_attributes element any 
more (rhbz#1568353)


This bz is not related to your issue.



[root@node01 testadmin]# pcs resource update SNMP_node01 user='' 
extra_options="-E /opt/occ/CXP/tools/PCSXXX.sh"


Later modified to below command it work for us.

[root@node01 testadmin]# pcs resource update SNMP_node01 user='root' 
extra_options="-E /opt/occ/CXP/tools/PCSXXX.sh"




The commands work as expected, user='root' sets the value of 'user' to 
'root', user='' deletes 'user'.


Specifying an empty value for an option is a syntax for removing the 
option. Yes, this means there is no way to set an option to an empty 
string value using pcs.


But our problem we have disabled root user account by default , so we 
are not sure which user can be given here and if that given user account 
got disabled/password expire what will be impact of this cluster 
monitoring SNMP service and so on.


While doing pcs resource create SNMP with empty string for user 
attribute it work for us, One more difference we noticed.


'pcs resource create' allows setting empty string values. It is a known 
bug which we track and will fix eventually.



Regards,
Tomas



Query:

 1) It is recommended to create SNMP ClusterMon resource type 
with empty user as attribute.


     2) if not , update resource with some user and that user 
account got disabled/password expire what will be impact of this cluster 
monitoring SNMP service and so on.


Thanks and Regards,

S Sathish


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Quorum when reducing cluster from 3 nodes to 2 nodes

2021-06-01 Thread Tomas Jelinek

Hi Robert,

Your corosync.conf looks fine to me, node app3 has been removed and 
two_node flag has been set properly. Try running 'pcs cluster reload 
corosync' and then check the quorum status. If it doesn't get fixed, 
take a look at /var/log/cluster/corosync.log to see if there are any 
issues reported.


Regards,
Tomas


Dne 31. 05. 21 v 20:55 Hayden,Robert napsal(a):




-Original Message-
From: Users  On Behalf Of Tomas Jelinek
Sent: Monday, May 31, 2021 6:29 AM
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] Quorum when reducing cluster from 3 nodes to 2
nodes

Hi Robert,

Can you share your /etc/corosync/corosync.conf file? Also check if it's
the same on all nodes.



I verified that the corosync.conf file is the same across the nodes.   As part of the troubleshooting, I 
manually ran the command "crm_node --remove=app3 --force" to remove the third node from the 
corosync configuration.   My concern is around why the quorum number did not auto downgrade to a value of 
"1", especially since we run with the "last_man_standing" flag.I suspect the issue is 
in the two-node special case.  That is, if I was removing a node from a 4+ node cluster, I would not have had 
an issue.

Here is the information you requested, slightly redacted for security.

root:@app1:/root
#20:45:02 # cat /etc/corosync/corosync.conf
totem {
 version: 2
 cluster_name: _app_2
 secauth: off
 transport: udpu
 token: 61000
}

nodelist {
 node {
 ring0_addr: app1
 nodeid: 1
 }

 node {
 ring0_addr: app2
 nodeid: 3
 }
}

quorum {
 provider: corosync_votequorum
 wait_for_all: 1
 last_man_standing: 1
 two_node: 1
}

logging {
 to_logfile: yes
 logfile: /var/log/cluster/corosync.log
 to_syslog: yes
}
root:@app1:/root
#20:45:12 # ssh app2 md5sum /etc/corosync/corosync.conf
d69b80cd821ff44224b56ae71c5d731c  /etc/corosync/corosync.conf
root:@app1:/root
#20:45:30 # md5sum /etc/corosync/corosync.conf
d69b80cd821ff44224b56ae71c5d731c  /etc/corosync/corosync.conf

Thanks
Robert


Dne 26. 05. 21 v 17:48 Hayden,Robert napsal(a):

I had a SysAdmin reduce the number of nodes in a OL 7.9 cluster from
three nodes to two nodes.

  From internal testing, I found the following commands would work and
the 2Node attribute would be automatically added.  The other cluster
parameters we use are WaitForAll and LastManStanding.

pcs resource disable res_app03

pcs resource delete res_app03

pcs cluster node remove app03

pcs stonith delete fence_app03

Unfortunately, real world didn't go as planned.   I am unsure if the
commands were ran out of order or something else was going on (e.g.
unexpected location constraints).   When I got involved, I noticed that
pcs status had the app3 node in an OFFLINE state, but the pcs cluster
node remove app03 command was successful.   I noticed some leftover
location constraints from past "moves" of resources.  I manually removed
those constraints and I ended up removing the app03 node from the
corosync configuration with "crm_node --remove=app3 --force" command.


This removes the node from pacemaker config, not from corosync config.

Regards,
Tomas


 Now pcs status no longer shows any information for app3 and crm_node
-l does not show app3.

My concern is with Quorum.   From the pcs quorum status output below, I
still see Quorum set at 2 (expected to be 1) and the 2Node attribute was
not added.   Am I stuck in this state until the next full cluster
downtime?  Or is there a way to manipulate the expected quorum votes in
the run time cluster?

#17:25:08 # pcs quorum status

Quorum information

--

Date: Wed May 26 17:25:16 2021

Quorum provider:  corosync_votequorum

Nodes:2

Node ID:  3

Ring ID:  1/85

Quorate:  Yes

Votequorum information

--

Expected votes:   2

Highest expected: 2

Total votes:  2

Quorum:   2

Flags:Quorate WaitForAll LastManStanding

Membership information

--

  Nodeid  VotesQdevice Name

   1  1 NR app1

   3  1 NR app2 (local)




CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.
___
Manage your subscripti

Re: [ClusterLabs] Quorum when reducing cluster from 3 nodes to 2 nodes

2021-05-31 Thread Tomas Jelinek

Hi Robert,

Can you share your /etc/corosync/corosync.conf file? Also check if it's 
the same on all nodes.


Dne 26. 05. 21 v 17:48 Hayden,Robert napsal(a):
I had a SysAdmin reduce the number of nodes in a OL 7.9 cluster from 
three nodes to two nodes.


 From internal testing, I found the following commands would work and 
the 2Node attribute would be automatically added.  The other cluster 
parameters we use are WaitForAll and LastManStanding.


pcs resource disable res_app03

pcs resource delete res_app03

pcs cluster node remove app03

pcs stonith delete fence_app03

Unfortunately, real world didn’t go as planned.   I am unsure if the 
commands were ran out of order or something else was going on (e.g. 
unexpected location constraints).   When I got involved, I noticed that 
pcs status had the app3 node in an OFFLINE state, but the pcs cluster 
node remove app03 command was successful.   I noticed some leftover 
location constraints from past “moves” of resources.  I manually removed 
those constraints and I ended up removing the app03 node from the 
corosync configuration with “crm_node --remove=app3 --force" command.


This removes the node from pacemaker config, not from corosync config.

Regards,
Tomas

    Now pcs status no longer shows any information for app3 and crm_node 
-l does not show app3.


My concern is with Quorum.   From the pcs quorum status output below, I 
still see Quorum set at 2 (expected to be 1) and the 2Node attribute was 
not added.   Am I stuck in this state until the next full cluster 
downtime?  Or is there a way to manipulate the expected quorum votes in 
the run time cluster?


#17:25:08 # pcs quorum status

Quorum information

--

Date: Wed May 26 17:25:16 2021

Quorum provider:  corosync_votequorum

Nodes:    2

Node ID:  3

Ring ID:  1/85

Quorate:  Yes

Votequorum information

--

Expected votes:   2

Highest expected: 2

Total votes:  2

Quorum:   2

Flags:    Quorate WaitForAll LastManStanding

Membership information

--

     Nodeid  Votes    Qdevice Name

  1  1 NR app1

  3  1 NR app2 (local)

CONFIDENTIALITY NOTICE This message and any included attachments are 
from Cerner Corporation and are intended only for the addressee. The 
information contained in this message is confidential and may constitute 
inside or non-public information under international, federal, or state 
securities laws. Unauthorized forwarding, printing, copying, 
distribution, or use of such information is strictly prohibited and may 
be unlawful. If you are not the addressee, please promptly delete this 
message and notify the sender of the delivery error by e-mail or you may 
call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) 
(816)221-1024.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

2021-05-18 Thread Tomas Jelinek




Dne 18. 05. 21 v 14:55 fatcha...@gmx.de napsal(a):




Gesendet: Dienstag, 18. Mai 2021 um 14:49 Uhr
Von: fatcha...@gmx.de
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

Hi Andrei,Hi everybody,

...

and it works great Thanks for the hint.
But the thing I still don't understand is why the cluster demotes is active 
node for a short time when I reenable a node from standby back to unstandby ? 
Is it not possible to join the drbd as secondary without demote the primary for 
a short moment ?


Try adding interleave=true to your clones.


I tried this but it get me an error msg, what is wrong ?

  pcs resource update database_drbd ocf:linbit:drbd drbd_resource=drbd1 
promotable promoted-max=1 promoted-node-max=1 clone-max=2 clone-node-max=1 
notify=true interleave=true

Error: invalid resource options: 'clone-max', 'clone-node-max', 'interleave', 
'notify', 'promoted-max', 'promoted-node-max', allowed options are: 
'adjust_master_score', 'connect_only_after_promote', 'drbd_resource', 
'drbdconf', 'fail_promote_early_if_peer_primary', 
'ignore_missing_notifications', 'remove_master_score_if_peer_primary', 
'require_drbd_module_version_ge', 'require_drbd_module_version_lt', 
'stop_outdates_secondary', 'unfence_extra_args', 'unfence_if_all_uptodate', 
'wfc_timeout', use --force to override


or is it simply:
pcs resource update database_drbd-clone interleave=true ?


Hi fatcharly,

The error comes from the fact that the update command as you used it is 
trying to update instance attributes (that is options which pacemaker 
passes to a resource agent). The agent doesn't define options you named, 
therefore pcs prints an error.


You want to update meta attributes, which are options pacemaker is 
processing by itself. This is how you do it:


pcs resource meta database_drbd-clone interleave=true


Regards,
Tomas





Any suggestions are welcome

Stay safe and take care

fatcharly





Gesendet: Mittwoch, 12. Mai 2021 um 19:04 Uhr
Von: "Andrei Borzenkov" 
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

On 12.05.2021 17:34, fatcha...@gmx.de wrote:

Hi Andrei, Hi everybody,



Gesendet: Mittwoch, 12. Mai 2021 um 16:01 Uhr
Von: fatcha...@gmx.de
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

Hi Andrei, Hi everybody,



You need order fs_database after promote operation; and as I just found
pacemaker also does not reverse it correctly and executes fs stop and
drbd demote concurrently. So you need additional order constraint to
first stop fs then demote drbd.


is there so good doku about this, I don't know how to archive a "after promote 
operation" and how can I tell the pcs to first dismount the filesystem mountpoint 
and then demote the drbd-device.


ok, so I found something and used this:

pcs constraint order stop fs_logfiles then demote drbd_logsfiles-clone
pcs constraint order stop fs_database then demote database_drbd-clone

and it works great Thanks for the hint.
But the thing I still don't understand is why the cluster demotes is active 
node for a short time when I reenable a node from standby back to unstandby ? 
Is it not possible to join the drbd as secondary without demote the primary for 
a short moment ?


Try adding interleave=true to your clones.



Best regards and take care

fatcharly




Sorry but this is new for me.

Best regards and take care

fatcharly





Gesendet: Dienstag, 11. Mai 2021 um 17:19 Uhr
Von: "Andrei Borzenkov" 
An: users@clusterlabs.org
Betreff: Re: [ClusterLabs] 2 node mariadb-cluster - constraint-problems ?

On 11.05.2021 17:43, fatcha...@gmx.de wrote:

Hi,

I'm using a CentOS 8.3.2011 with a pacemaker-2.0.4-6.el8_3.1.x86_64 + 
corosync-3.0.3-4.el8.x86_64 and kmod-drbd90-9.0.25-2.el8_3.elrepo.x86_64.
The cluster consists of two nodes which are providing a ha-mariadb with the 
help of two drbd devices for the database and the logfiles. The corosync is 
working over two rings and both machines are virtual kvm-guests.

Problem:
Node susanne is the active node and lisbon is changing from standby to active, 
susanna is trying to demote one drbd-device but is failling to. The cluster is 
working on properly, but the error stays.
This is the what happens:

Cluster Summary:
   * Stack: corosync
   * Current DC: lisbon (version 2.0.4-6.el8_3.1-2deceaa3ae) - partition with 
quo rum
   * Last updated: Tue May 11 16:15:54 2021
   * Last change:  Tue May 11 16:15:42 2021 by root via cibadmin on susanne
   * 2 nodes configured
   * 11 resource instances configured

Node List:
   * Online: [ lisbon susanne ]

Active Resources:
   * HA_IP   (ocf::heartbeat:IPaddr2):Started susanne
   * Clone Set: database_drbd-clone [database_drbd] (promotable):
 * Masters: [ susanne ]
 * Slaves: [ lisbon ]
   * Clone Set: drbd_logsfiles-clone [drbd_logsfiles] (promotable):
 * drbd_logsfiles(ocf::linbit:drbd):  

Re: [ClusterLabs] multi-state constraints

2021-05-13 Thread Tomas Jelinek

Dne 11. 05. 21 v 20:22 Alastair Basden napsal(a):

Single location constraint may have multiple rules, I would assume pcs
supports it. It is certainly supported by crmsh.


Yes, it is supported by pcs. First, create a location rule constraint
with 'pcs constraint location ... rule'. Then you can add more rules to
it with 'pcs constraint rule add' command.


So:
pcs constraint location resourceClone rule role=master score=100 \#uname 
eq node1
pcs constraint location resourceClone rule add role=master score=50 
\#uname eq node2


Is that the same as:
pcs constraint location resourceClone rule role=master score=100 \#uname 
eq node1
pcs constraint location resourceClone rule role=master score=50 \#uname 
eq node2

?



The first two commands create a single constraint with two rules. The 
other two commands create two constraints with one rule each. So it's 
not the same strictly speaking, even though it has the same effect:
A location constraint may contain one or more top-level rules. The 
cluster will act as if there is a separate location constraint for each 
rule that evaluates as true. [1]


Also note your rule add command doesn't work, the syntax is:
pcs constraint rule add  [id=] 
[role=master|slave] [score=|score-attribute=] 
So you create a constraint, get its id from 'pcs constraint location' 
and then you add the second rule to the constraint using the id.


Regards,
Tomas

[1] 
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Explained/index.html#_using_rules_to_determine_resource_location






On Tue, 11 May 2021, Andrei Borzenkov wrote:


[EXTERNAL EMAIL]

On Tue, May 11, 2021 at 10:50 AM Alastair Basden
 wrote:


Hi Andrei, all,

So, what I want to achieve is that if both nodes are up, node1
preferentially has drbd as master.  If that node fails, then node2
should
become master.  If node1 then comes back online, it should become 
master

again.

I also want to avoid node3 and node4 ever running drbd, since they 
don't

have the disks.

For the link below about promotion scores, what is the pcs command to
achieve this?  I'm unfamiliar with where the xml goes...



I do not normally use PCS so am not familiar with its syntax. I assume
there should be documentation that describes how to define location
constraints with rules. Maybe someone who is familiar with it can
provide an example.




I notice that drbd9 has an auto promotion feature, perhaps that would
help
here, and so I can forget about configuring drbd in pacemaker?  Is 
that

how it is supposed to work?  i.e. I can just concentrate on the
overlying
file system.

Sorry that I'm being a bit slow about all this.

Thanks,
Alastair.

On Tue, 11 May 2021, Andrei Borzenkov wrote:


[EXTERNAL EMAIL]

On 10.05.2021 20:36, Alastair Basden wrote:

Hi Andrei,

Thanks.  So, in summary, I need to:
pcs resource create resourcedrbd0 ocf:linbit:drbd
drbd_resource=disk0 op
monitor interval=60s
pcs resource master resourcedrbd0Clone resourcedrbd0 master-max=1
master-node-max=1 clone-max=2 clone-node-max=1 notify=true

pcs constraint location resourcedrb0Clone prefers node1=100
pcs constraint location resourcedrb0Clone prefers node2=50
pcs constraint location resourcedrb0Clone avoids node3
pcs constraint location resourcedrb0Clone avoids node4

Does this mean that it will prefer to run as master on node1, and
slave
on node2?


No. I already told you so.


   If not, how can I achieve that?



DRBD resource agents sets master scores based on disk state. If you
statically override this decision you are risking promoting stale 
copy
which means data loss (I do not know if agent allows it, 
hopefully not;

but then it will continue to attempt to promote wrong copy and
eventually fail). But if you insist, it is documented:

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-promotion-scores.html 




Also statically biasing one single node means workload will be
relocated
every time node becomes available, which usually implies additional
downtime. That is something normally avoided (which is why resource
stickiness exists).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___

Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




Re: [ClusterLabs] multi-state constraints

2021-05-11 Thread Tomas Jelinek

Hi,

Dne 11. 05. 21 v 17:31 Andrei Borzenkov napsal(a):

On 11.05.2021 18:20, Alastair Basden wrote:

Hi,

So, I think the following would do it:
pcs constraint location resourceClone rule role=master score=100 uname
eq node1
pcs constraint location resourceClone rule role=master score=50 uname eq
node2



Single location constraint may have multiple rules, I would assume pcs
supports it. It is certainly supported by crmsh.


Yes, it is supported by pcs. First, create a location rule constraint 
with 'pcs constraint location ... rule'. Then you can add more rules to 
it with 'pcs constraint rule add' command.



But, I'm unsure about uname.  In the xml example you showed, it had
#uname.  Is that correct, or do I use uname without the hash?



This should be #uname - it is special attribute.


Correct, the attribute is #uname. You just need to prevent your shell 
from interpreting the # sign. Either do \#uname or '#uname'.


Regards,
Tomas


So perhaps:
pcs constraint location resourceClone rule role=master score=50 \#uname
eq node2

Cheers,
Alastair.

On Tue, 11 May 2021, Andrei Borzenkov wrote:


[EXTERNAL EMAIL]

On Tue, May 11, 2021 at 10:50 AM Alastair Basden
 wrote:


Hi Andrei, all,

So, what I want to achieve is that if both nodes are up, node1
preferentially has drbd as master.  If that node fails, then node2
should
become master.  If node1 then comes back online, it should become master
again.

I also want to avoid node3 and node4 ever running drbd, since they don't
have the disks.

For the link below about promotion scores, what is the pcs command to
achieve this?  I'm unfamiliar with where the xml goes...



I do not normally use PCS so am not familiar with its syntax. I assume
there should be documentation that describes how to define location
constraints with rules. Maybe someone who is familiar with it can
provide an example.




I notice that drbd9 has an auto promotion feature, perhaps that would
help
here, and so I can forget about configuring drbd in pacemaker?  Is that
how it is supposed to work?  i.e. I can just concentrate on the
overlying
file system.

Sorry that I'm being a bit slow about all this.

Thanks,
Alastair.

On Tue, 11 May 2021, Andrei Borzenkov wrote:


[EXTERNAL EMAIL]

On 10.05.2021 20:36, Alastair Basden wrote:

Hi Andrei,

Thanks.  So, in summary, I need to:
pcs resource create resourcedrbd0 ocf:linbit:drbd
drbd_resource=disk0 op
monitor interval=60s
pcs resource master resourcedrbd0Clone resourcedrbd0 master-max=1
master-node-max=1 clone-max=2 clone-node-max=1 notify=true

pcs constraint location resourcedrb0Clone prefers node1=100
pcs constraint location resourcedrb0Clone prefers node2=50
pcs constraint location resourcedrb0Clone avoids node3
pcs constraint location resourcedrb0Clone avoids node4

Does this mean that it will prefer to run as master on node1, and
slave
on node2?


No. I already told you so.


   If not, how can I achieve that?



DRBD resource agents sets master scores based on disk state. If you
statically override this decision you are risking promoting stale copy
which means data loss (I do not know if agent allows it, hopefully not;
but then it will continue to attempt to promote wrong copy and
eventually fail). But if you insist, it is documented:

https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/s-promotion-scores.html


Also statically biasing one single node means workload will be
relocated
every time node becomes available, which usually implies additional
downtime. That is something normally avoided (which is why resource
stickiness exists).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___

Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Stopping the last node with pcs

2021-04-29 Thread Tomas Jelinek

Hi,

sorry for the late answer and thanks everyone who already responded.

This feature was implemented a long time ago based on
https://bugzilla.redhat.com/show_bug.cgi?id=1180506
The request was: pcs cluster stop -> this operation should verify if 
removing a node from the cluster will cause loss of quorum and abort. Of 
course we want to allow a manual override.


And that's what we did.

The idea is to warn users that by stopping specified node(s) or the 
local node, the cluster will lose quorum and stop all resources. If the 
last node is being stopped, then obviously all resources will be 
stopped. The wording of the message could be improved in this case. But 
in general, I agree with Ken and lean towards keeping the behavior.


As Ulrich pointed out, 'pcs cluster stop --all' doesn't check for quorum 
loss, since the user made it clear (by specifying --all), that they want 
to stop the whole cluster.


Regards,
Tomas


Dne 28. 04. 21 v 17:41 Digimer napsal(a):

On 2021-04-28 10:10 a.m., Ken Gaillot wrote:

On Tue, 2021-04-27 at 23:23 -0400, Digimer wrote:

Hi all,

   I noticed something odd.


[root@an-a02n01 ~]# pcs cluster status
Cluster Status:
  Cluster Summary:
* Stack: corosync
* Current DC: an-a02n01 (version 2.0.4-6.el8_3.2-2deceaa3ae) -
partition with quorum
* Last updated: Tue Apr 27 23:20:45 2021
* Last change:  Tue Apr 27 23:12:40 2021 by root via cibadmin on
an-a02n01
* 2 nodes configured
* 12 resource instances configured (4 DISABLED)
  Node List:
* Online: [ an-a02n01 ]
* OFFLINE: [ an-a02n02 ]

PCSD Status:
   an-a02n01: Online
   an-a02n02: Offline

[root@an-a02n01 ~]# pcs cluster stop
Error: Stopping the node will cause a loss of the quorum, use --force
to
override


   Shouldn't pcs know it's the last node and shut down without
complaint?


It knows, it's just not sure you know :)

pcs's design philosophy is to hand-hold users by default and give
expert users --force.

The idea in this case is that (especially in 3-to-5-node clusters)
someone might not realize that stopping one node could make all
resources stop cluster-wide.


This makes total sense in 3+ node cluster. However, when you're asking
the last node in a two-node cluster to stop, then it seems odd. Perhaps
overriding this behaviour when 2-node is set?

In any case, I'm calling this from a program and that means I need to
use '--force' all the time (or add some complex logic of my own, which I
can do).

Well anyway, now I know it was intentional. :)



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.9.170 released

2021-04-16 Thread Tomas Jelinek

I am happy to announce the latest release of pcs-0.9, version 0.9.170.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.170.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.170.zip

This is the final release of pcs-0.9 branch.
The branch supports clusters running Pacemaker 1.x on top of Corosync
2.x or Corosync 1.x with CMAN. Those versions are near or past EOL.

Development of pcs-0.10 branch supporting the latest versions of HA
Cluster stack components continues.


Complete change log for this release:
## [0.9.170] - 2021-04-16

### Security
- Web UI sends HTTP header Content-Security-Policy as an alternative to
  X-Frame-Options
- Added support for loading DH keys from a file ([rhbz#1888479])

### Added
- `pcs resource [safe-]disable --simulate` has a new option `--brief` to
  print only a list of affected resources ([rhbz#1833115])

### Fixed
- Keep autogenerated IDs of set constraints reasonably short
  ([rhbz#1387358], [rhbz#1824206])
- Race-condition when removing multiple resources from web UI
  ([rhbz#1843593])
- Explicitly close libcurl connections to prevent stalled TCP
  connections in CLOSE-WAIT state ([ghissue#261], [rhbz#1870551])


Thanks / congratulations to everyone who contributed to this release,
including Ondrej Mular and Tomas Jelinek.

Cheers,
Tomas


[ghissue#261]: https://github.com/ClusterLabs/pcs/issues/261
[rhbz#1387358]: https://bugzilla.redhat.com/show_bug.cgi?id=1387358
[rhbz#1824206]: https://bugzilla.redhat.com/show_bug.cgi?id=1824206
[rhbz#1833115]: https://bugzilla.redhat.com/show_bug.cgi?id=1833115
[rhbz#1843593]: https://bugzilla.redhat.com/show_bug.cgi?id=1843593
[rhbz#1870551]: https://bugzilla.redhat.com/show_bug.cgi?id=1870551
[rhbz#1888479]: https://bugzilla.redhat.com/show_bug.cgi?id=1888479

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] best practice for scripting

2021-04-13 Thread Tomas Jelinek

Hi,

You can use 'pcs cluster cib' for pacemaker configuration and 'pcs 
cluster status xml' for pacemaker status. Both commands basically just 
pass xml obtained from pacemaker, though. As far as I know, corosync 
also provides parsable output, take a look at corosync-cmapctl. I'm not 
sure about other cluster components.


Be aware that 'pcs status xml' may change in a near future. The reason 
for that is that the xml produced by pacemaker is also changing as a 
part of an effort to provide standardized xml output from most of (or 
all) pacemaker tools.


As far as pcs text output changes: Yes, they are happening. Sometimes 
it's a result of a change in another cluster component (e.g. 'pcs 
status' prints text output from crm_mon). In other cases, it's due to 
adding features, fixing bugs, standardizing output format, etc. It's not 
always possible to avoid the changes.


As we are aware of the difficulties of using pcs in scripts, we indeed 
have a long term goal to provide machine readable output from pcs. Since 
pcs started with a focus on producing human readable output, it's a lot 
of work to do. Quite a big part of pcs code base cannot be easily 
switched to producing machine parsable output. We are slowly moving 
towards it (which in some cases may cause text output changes due to 
code reuse), but it is currently not seen as something to be finished in 
a year.


Regards,
Tomas


Dne 13. 04. 21 v 9:17 d tbsky napsal(a):

Hi:
 I have some scripts which use 'pcs' and 'crm_mon'. I prefer pcs
since it is an all-in-one tool. but besides 'pcs cluster cib' it has
no stable text output. reading document of pacemaker 2.1 I found it
says:

"In addition to crm_mon and stonith_admin, the crmadmin, crm_resource,
crm_simulate, and crm_verify commands now support the --output-as and
--output-to options, including XML output (which scripts and
higher-level tools are strongly recommended to use instead of trying
to parse the text output, which may change from release to release)."

so if I need to parse the command output, I think I should study these
commands instead of pcs in the future?  RedHat is doing great job to
maintain the 'pcs' output consistency between minor versions (eg:
7.2-> 7.3). but with major version 7->8 many things changed. if there
is a more stable method for scripting I would like to follow it.

thanks a lot for help!
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Tomas Jelinek
If you stopped a node and you want it to start and reconnect to its 
cluster, run 'pcs cluster start' on the node. You may also run 'pcs 
cluster start --all' or (in your case) 'pcs cluster start node1' on any 
cluster node.


Tomas


Dne 29. 03. 21 v 16:25 Jason Long napsal(a):

Thank you.
Then, if a node disconnected then how it could back to the cluster chain?






On Monday, March 29, 2021, 06:13:09 PM GMT+4:30, Tomas Jelinek 
 wrote:





Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a
node, pcs is unable to connect to cluster daemons on the node (since
they are not running) and prints an error message denoting that. This is
expected behavior.

Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):

Thank you.
I have other questions:

1- How can I launch a test lab?
2- Why, when I stop node1 manually and then start it again, I can't browse 
"http://127.0.0.1:2080;? I think when I stopped node1 then Pacemaker forget to 
back to the chain!!!
3- Why, when I stopped node1, then "pcs status nodes" command not work? It shows me 
"Error: error running crm_mon, is pacemaker running?".






On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
 wrote:





On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:

Then, how can I sure my configuration is OK?
In a clustering environment, when a node disconnected then another
node must replace it. Am I right?
I did a test:
I defined a NAT interface for my VM2 (node2) and used port
forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
Guest.
When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
then it doesn't show anything.
I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
show anything, but when I browse "http://127.0.0.1:2090;, then it has
shown me "My Test Site - node2".
Could this mean that my cluster is working properly?


Port-forwarding to a single VM can never allow the other VM to take
over.

The intent of the floating IP address is to have a single, unique
address that users can use to contact the service. The cluster can move
this IP to one VM or the other, and that is invisible to users. The
term "floating" is intended to convey this, that the IP address is not
tied to a single node, but can move ("float") from one node to another,
transparently to users using the IP address.

In this case, the floating IP would take the place of the 127.0.0.1
port-forwarding addresses. Instead of two port-forwarding addresses,
you just have the one floating IP address.

How you get that working with a reverse proxy is up to you. The
Clusters from Scratch example shows how to do it with a web server, to
present the concepts, and you can tailor that to any service that needs
to be clustered.



On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
kwenn...@redhat.com> wrote:





On 3/25/21 9:55 AM, Jason Long wrote:

Thank you so much.

       Now you can proceed with the "Add Apache HTTP" section.


What does it mean? I did all steps in the document.


       Once apache is set up as a cluster resource, you should be
able to contact the web server at the floating IP...


# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
#
# pcs status
Error: error running crm_mon, is pacemaker running?
       Could not connect to the CIB: Transport endpoint is not
connected
       crm_mon: Error: cluster is not available on this node
#
# curl http://192.168.56.9

       My Test Site - node2
       

Thank you about it, but I want to use these two VMs as an Apache
Reverse Proxy Server. When one of my nodes stopped, then another
node start servicing.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding:
"127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

When I stopped node1 and browse "http://127.0.0.1:2080; then I
can't see anything. I want it shown me "My Test Site - node2". I
think it is reasonable because when on of my Reverse Proxy Server
(node1) stopped, then other Reverse Proxy Server (node2) started.

How can I achieve this goal?


Definitely not using that NAT interface I would say.
It will just be able to connect you with a service running on VM1.
And that doesn't make any sense seen from a high-availability
point of view. Even if you setup NAT that would make the
proxy on node2 visible via VM1 this wouldn't give you
increased availability - rather the opposite due to increased
complexity. In high-availability we are speaking of a Single
Point of Failure (SPOF) which VM1 is gonna be here and what you
never ever wanna have.






On 

Re: [ClusterLabs] WebSite_start_0 on node2 'error' (1): call=6, status='complete', exitreason='Failed to access httpd status page.'

2021-03-29 Thread Tomas Jelinek

Hi Jason,

Regarding point 3:
Most pcs commands operate on the local node. If you stop a cluster on a 
node, pcs is unable to connect to cluster daemons on the node (since 
they are not running) and prints an error message denoting that. This is 
expected behavior.


Regards,
Tomas


Dne 27. 03. 21 v 6:54 Jason Long napsal(a):

Thank you.
I have other questions:

1- How can I launch a test lab?
2- Why, when I stop node1 manually and then start it again, I can't browse 
"http://127.0.0.1:2080;? I think when I stopped node1 then Pacemaker forget to 
back to the chain!!!
3- Why, when I stopped node1, then "pcs status nodes" command not work? It shows me 
"Error: error running crm_mon, is pacemaker running?".






On Thursday, March 25, 2021, 09:08:45 PM GMT+4:30, Ken Gaillot 
 wrote:





On Thu, 2021-03-25 at 14:44 +, Jason Long wrote:

Then, how can I sure my configuration is OK?
In a clustering environment, when a node disconnected then another
node must replace it. Am I right?
I did a test:
I defined a NAT interface for my VM2 (node2) and used port
forwarding: "127.0.0.1:2090" on Host  FORWARDING TO 127.0.0.1:80 on
Guest.
When node1 is OK and I browse "http://127.0.0.1:2080; then it shown
me "My Test Site - node1", but when I browse "http://127.0.0.1:2090;
then it doesn't show anything.
I stopped node1 and when I browse "http://127.0.0.1:2080; it doesn't
show anything, but when I browse "http://127.0.0.1:2090;, then it has
shown me "My Test Site - node2".
Could this mean that my cluster is working properly?


Port-forwarding to a single VM can never allow the other VM to take
over.

The intent of the floating IP address is to have a single, unique
address that users can use to contact the service. The cluster can move
this IP to one VM or the other, and that is invisible to users. The
term "floating" is intended to convey this, that the IP address is not
tied to a single node, but can move ("float") from one node to another,
transparently to users using the IP address.

In this case, the floating IP would take the place of the 127.0.0.1
port-forwarding addresses. Instead of two port-forwarding addresses,
you just have the one floating IP address.

How you get that working with a reverse proxy is up to you. The
Clusters from Scratch example shows how to do it with a web server, to
present the concepts, and you can tailor that to any service that needs
to be clustered.



On Thursday, March 25, 2021, 05:20:33 PM GMT+4:30, Klaus Wenninger <
kwenn...@redhat.com> wrote:





On 3/25/21 9:55 AM, Jason Long wrote:

Thank you so much.

     Now you can proceed with the "Add Apache HTTP" section.


What does it mean? I did all steps in the document.


     Once apache is set up as a cluster resource, you should be
able to contact the web server at the floating IP...


# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...
#
# pcs status
Error: error running crm_mon, is pacemaker running?
     Could not connect to the CIB: Transport endpoint is not
connected
     crm_mon: Error: cluster is not available on this node
#
# curl http://192.168.56.9

     My Test Site - node2
     

Thank you about it, but I want to use these two VMs as an Apache
Reverse Proxy Server. When one of my nodes stopped, then another
node start servicing.

My test lab use VirtualBox with two VMs as below:
VM1: This VM has two NICs (NAT, Host-only Adapter)
VM2: This VM has one NIC (Host-only Adapter)

On VM1, I use the NAT interface for the port forwarding:
"127.0.0.1:2080" on Host  FORWARDING TO 127.0.0.1:80 on Guest.

When I stopped node1 and browse "http://127.0.0.1:2080; then I
can't see anything. I want it shown me "My Test Site - node2". I
think it is reasonable because when on of my Reverse Proxy Server
(node1) stopped, then other Reverse Proxy Server (node2) started.

How can I achieve this goal?


Definitely not using that NAT interface I would say.
It will just be able to connect you with a service running on VM1.
And that doesn't make any sense seen from a high-availability
point of view. Even if you setup NAT that would make the
proxy on node2 visible via VM1 this wouldn't give you
increased availability - rather the opposite due to increased
complexity. In high-availability we are speaking of a Single
Point of Failure (SPOF) which VM1 is gonna be here and what you
never ever wanna have.






On Wednesday, March 24, 2021, 10:21:09 PM GMT+4:30, Ken Gaillot <
kgail...@redhat.com> wrote:





On Wed, 2021-03-24 at 10:50 +, Jason Long wrote:

Thank you.
Form node1 and node2, I can ping the floating IP address
(192.168.56.9).
I stopped node1:

# pcs cluster stop node1
node1: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (corosync)...

And from both machines, I can ping the floating IP address:

[root@node1 ~]# ping 192.168.56.9
PING 192.168.56.9 (192.168.56.9) 56(84) bytes of data.
64 bytes from 192.168.56.9: icmp_seq=1 ttl=64 time=0.504 ms
64 bytes from 192.168.56.9: 

Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-12 Thread Tomas Jelinek

Hi Sathish,

Sorry, I don't know how to get ruby 2.2.0 for RHEL 7.

The types of support have been explained by Ken. It has already been 
said that pcs-0.10 is not supported on RHEL 7.


I would recommend to either downgrade to pacemaker 1.x and pcs-0.9, as 
suggested by Ken, or upgrade your nodes to RHEL 8. Either of the options 
would get you to a supported configuration.



Regards,
Tomas


Dne 11. 03. 21 v 15:18 S Sathish S napsal(a):

Hi Tomas,

Thanks for your response.

Python 3.6+ package is available in RHEL 7 stream but for ruby 2.2.0+ we 
don’t have package available in RHEL 7 stream . how to overcome this 
problem can you provide way-forward for the same.


It also stated runtime dependencies of pcs and pcsd.

Thanks and Regards,

S Sathish S



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker-2.0.x version support on “RHEL 7” OS

2021-03-10 Thread Tomas Jelinek

Hi,

The same principles apply to both pcs and pacemaker (and most probably 
the whole cluster stack).


Red Hat only supports the packages it provides, which is pcs-0.9 series 
in RHEL 7.


Even if you manage to install ruby 2.2.0+ and python 3.6+ on RHEL 7 
hosts and build and run pcs-0.10 on top of that, Red Hat won't support it.



Regards,
Tomas


Dne 10. 03. 21 v 8:15 S Sathish S napsal(a):

Hi Ken/Team,

We are using pacemaker software from Clusterlab upstream with version 
pacemaker 2.0.2 in our RHEL 7 system. So for fixing the CVE’s 
CVE-2020-25654 we don’t want to downgrade to lower version of pacemaker 
1.x hence we are trying to build latest pcs-0.10 version from upstream 
source has runtime dependencies for ruby 2.2.0+ which is not available 
in RHEL 7.x stream and getting compilation error , Please check and 
advise us whether pcs-0.10 is supported on RHEL 7.


we also need to understand pacemaker1.x and pacemaker2.x clusterlab 
supporting term whether new feature/security fix/bug fix will be handled 
for both channels.


Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs resource disable/enable is taking longer time in 32 node cluster setup for all components up and running.

2021-03-08 Thread Tomas Jelinek

Hi,

What method are you using to stop all resources?

If you run 'pcs resource disable ' in a loop, then it may 
take time for 300+ resources. By doing it this way, you run pacemaker 
scheduler for stopping each resource individually.


If you want to stop all resources, just run 'pcs property set 
stop-all-resources=true'.


If you want to stop all resources and control them individually, do it 
like this:

1) pcs cluster cib > original.xml
2) cp original.xml new.xml
3) repeat for all resources: pcs -f new.xml resource disable 
4) pcs cluster cib-push new.xml diff-against=original.xml
This way, pacemaker scheduler is run only once.

Regards,
Tomas


Dne 05. 03. 21 v 8:19 S Sathish S napsal(a):

Hi Team,

We are setup 32 nodes pacemaker cluster setup each node has 10 resource 
so total [around 300+ components] are up and running. Whenever we 
perform pcs resource disable/enable command for all resource in a script 
it take 5-8 mins in our test lab.


If we execute same in Single node, 3 nodes and 5 nodes, it is taking 
less than 2 minutes. We never experienced delay up to 9 node cluster setup.


Currently, we don’t have 32 node cluster lab setup to reproduce the 
issue and collect the logs, we would like to confirm i.e., Is there any 
known issues faced or similar challenges when pcs is managing the 32 
node cluster set up.


Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Updating CIB Question Update does not conform to the configured schema

2021-02-22 Thread Tomas Jelinek

Hi,

It seems to me that you defined 
"virtualip-instance_attributes-secondary_ip_address" id twice. That is 
not allowed, IDs must be unique.


Also note that multiple instance attributes with rules are not supported 
by pcs. Your CIB will work in general, just don't use pcs to update your 
'virtualip' resource instance attributes.


Regards,
Tomas


Dne 22. 02. 21 v 3:51 AM. E. napsal(a):

Hi,
I am using version 1.1.18 of pacemaker on my linux ubuntu 16.04.
I am trying to push cib after updating and I keep getting the error below:

$ sudo pcs cluster cib-push cib_temp
Error: unable to push cib
Call cib_replace failed (-203): Update does not conform to the 
configured schema


What I have added  was the sections in bold (I am trying to implement 
multi subnet deployment)
       type="awsvip">

         
*
value="ag1"/>

*
           id="virtualip-instance_attributes-secondary_ip_address" 
name="secondary_ip_address" value="10.45.54.47"/>

         
         
*
value="ag2"/>

*
           id="virtualip-instance_attributes-secondary_ip_address" 
name="secondary_ip_address" value="10.45.54.147"/>

         

Did I violate the schema ?  I saw a post at 
https://www.drware.com/configure-multiple-subnet-alwayson-availability-groups-by-modifying-cib/ 


Where the same portion was updated and there was no such issue?

I am using this version
$ pacemakerd --version
Pacemaker 1.1.14

Any insights?
Thanks
Ayman Els

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] @ - unsupported char - bug or something to improve upon?

2021-02-22 Thread Tomas Jelinek

Hi all,

The error message comes from pcs.

Pacemaker doesn't allow '@' to be used in a resource name. The name is 
an XML ID and those cannot contain '@' (and some other) characters. If 
pcs allowed '@' in the resource name, pacemaker would reject such a CIB. 
As long as this is the case, there's nothing to fix in pcs.


There's really no option other than replacing '@' in the resource name 
as Valentin pointed out.


Regards,
Tomas


Dne 20. 02. 21 v 11:55 Valentin Vidić napsal(a):

On Sat, Feb 20, 2021 at 09:35:33AM +, lejeczek wrote:

-> $ pcs resource create "dropbox\@me" systemd:"dropbox\@me"

as you can see I've been trying to escape '@' too, in various ways, to no
avail.


Right, but just changing the resource name should work:

   $ pcs resource create dropbox_me systemd:dropbox@me



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Old pcs-0.9.167/0.9.168/0.9.169 package with newer corosync-3.1 and pacemaker-2.1 on RHEL 7.7

2021-02-08 Thread Tomas Jelinek

Hi,

There are significant changes between corosync 2.x and 3.x and similarly 
between pacemaker 1.x and 2.x. To cover those and support the new 
versions, we created pcs-0.10 branch. The old versions are supported by 
pcs-0.9 branch.


RHEL 7 ships corosync 2.x and pacemaker 1.x, so we build pcs-0.9.x for 
it. Corosync 3.x and pacemaker 2.x are shipped in RHEL 8, so that's 
where pcs-0.10 goes.


RHEL 8 also ships new version of ruby and python. So on top of 
modifications needed for corosync 3.x and pacemaker 2.x support, we used 
the opportunity to get rid of legacy code supporting those end-of-life 
python and ruby versions.



Regards,
Tomas


Dne 08. 02. 21 v 8:42 Fabio M. Di Nitto napsal(a):

Hi,

adding the lead developer for pcs, and the pacemaker user devel mailing 
list.


This mailing list is specific to kronosnet development.

On 05/02/2021 10.17, RUGINA Szabolcs-Gavril wrote:

Hi folks,

Thank you for your time and will to help.

We are in the middle of a high pressure because some problems with 
RHEL HA Cluster/Corosync+Pacemaker/DRBD and we arrived at the point 
that we upgraded corosync 2.4.3 to 3.1 and pacemaker 1.1.20 to 2.1 but 
then our setup gives errors with the command


“pcs property set no-quorum-policy=stop”.

The error is:

/No such file or directory', 'command': 
'/usr/libexec/pacemaker/pengine metadata’/


Indeed //usr/libexec/pacemaker/pengine /does not exists, I assume it 
is OK, because in the new pacemaker 3.1 the internal structure was 
changed..


In the midtime we found out that we could archive the same setting 
using crm_attribute --name no-quorum-policy –update stop.


My questions are:

  * Is there a reason that pcs has no build for RHEL 7?


I can only answer this question, newer version of pcs needs ruby and 
python in versions that are not supported on RHEL7 and even if we can 
build the package, it won´t run.


Fabio


  * Is the crm_attribute the right choice for setting cluster properties
    and will cover all usecases?
*

Thank You!

BR,

Szabolcs


___
Users mailing list
us...@lists.kronosnet.org
https://lists.kronosnet.org/mailman/listinfo/users





___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pacemaker 2.0.5 version pcs status resources command is not working

2021-02-04 Thread Tomas Jelinek

Hi,

pcs-0.9 does not support pacemaker => 2.0.0. You can go with pcs-0.9 + 
corosync < 3 + pacemaker 1.x OR pcs-0.10 + corosync 3.x + pacemaker 2.x. 
Combination of corosync 2 + pacemaker 2 is not supported in any pcs 
version, even though it may work to some degree.


Regards,
Tomas


Dne 03. 02. 21 v 22:28 Reid Wahl napsal(a):
With that in mind, I'd suggest upgrading to a newer pcs version if 
possible. If not, then you may have to do something more hack-y, like 
`pcs status | grep '(.*:.*):'`.


On Wed, Feb 3, 2021 at 1:26 PM Reid Wahl > wrote:


Looks like pcs-0.9 isn't fully compatible with pacemaker >= 2.0.3.
   -

https://github.com/ClusterLabs/pcs/commit/0cf06b79f6dcabb780ee1fa7fee0565d73789329



The resource_status() function in older pcs versions doesn't match
the lines in the crm_mon output of newer pacemaker versions.

On Wed, Feb 3, 2021 at 9:10 AM S Sathish S mailto:s.s.sath...@ericsson.com>> wrote:

Hi Team,

__ __

In latest pacemaker version 2.0.5 we are not getting "pcs status
resource" command output but in older version we used to get the
output.

__ __

Kindly let us know any already command to get pcs full list
resource.

__ __

*Latest Pacemaker version* :

pacemaker-2.0.5 -->
https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.5


corosync-2.4.4 -->
https://github.com/corosync/corosync/tree/v2.4.4


pcs-0.9.169

__ __

[root@node2 ~]# pcs status resources

[root@node2 ~]#

__ __

*Older Pacemaker version* : 

__ __

pacemaker-2.0.2 -->
https://github.com/ClusterLabs/pacemaker/tree/Pacemaker-2.0.2


corosync-2.4.4 -->
https://github.com/corosync/corosync/tree/v2.4.4


pcs-0.9.169

__ __

[root@node1 ~]# pcs status resources

TOMCAT_node1 (ocf::provider:TOMCAT_RA):  Started node1

HEALTHMONITOR_node1  (ocf::provider:HealthMonitor_RA):  
Started node1


SNMP_node1   (ocf::pacemaker:ClusterMon):    Started node1

[root@node1 ~]#

__ __

Thanks and Regards,

S Sathish S

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users


ClusterLabs home: https://www.clusterlabs.org/




-- 
Regards,


Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA



--
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.8 released

2021-02-02 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.8.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.8.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.8.zip

This release brings new commands for modifying corosync configuration in
an existing (and running) cluster, including crypto configuration and
keys. It also contains several bug fixes, including a fix for unwanted
logging to system log.


Complete change log for this release:
## [0.10.8] - 2021-02-01

### Added
- Support for changing corosync configuration in an existing cluster
  ([rhbz#1457314], [rhbz#1667061], [rhbz#1856397], [rhbz#1774143])
- Command to show structured corosync configuration (see `pcs cluster
  config show` command) ([rhbz#1667066])

### Fixed
- Improved error message with a hint in `pcs cluster cib-push`
  ([ghissue#241])
- Option --wait was not working with pacemaker 2.0.5+ ([ghissue#260])
- Explicitly close libcurl connections to prevent stalled TCP
  connections in CLOSE-WAIT state ([ghissue#261], [rhbz#1885841])
- Fixed parsing negative float numbers on command line ([rhbz#1869399])
- Removed unwanted logging to system log (/var/log/messages)
  ([rhbz#1917286])
- Fixed rare race condition in `pcs cluster start --wait`
  ([rhbz#1794062])
- Better error message when unable to connect to pcsd ([rhbz#1619818])

### Deprecated
- Commands `pcs config import-cman` and `pcs config export
  pcs-commands|pcs-commands-verbose` have been deprecated
  ([rhbz#1851335])
- Entering values starting with '-' (negative numbers) without '--' on
  command line is now deprecated ([rhbz#1869399])


Thanks / congratulations to everyone who contributed to this release,
including Fabio M. Di Nitto, Ivan Devat, Miroslav Lisik, Ondrej Mular
and Tomas Jelinek.

Cheers,
Tomas


[ghissue#241]: https://github.com/ClusterLabs/pcs/issues/241
[ghissue#260]: https://github.com/ClusterLabs/pcs/issues/260
[ghissue#261]: https://github.com/ClusterLabs/pcs/issues/261
[rhbz#1457314]: https://bugzilla.redhat.com/show_bug.cgi?id=1457314
[rhbz#1619818]: https://bugzilla.redhat.com/show_bug.cgi?id=1619818
[rhbz#1667061]: https://bugzilla.redhat.com/show_bug.cgi?id=1667061
[rhbz#1667066]: https://bugzilla.redhat.com/show_bug.cgi?id=1667066
[rhbz#1774143]: https://bugzilla.redhat.com/show_bug.cgi?id=1774143
[rhbz#1794062]: https://bugzilla.redhat.com/show_bug.cgi?id=1794062
[rhbz#1851335]: https://bugzilla.redhat.com/show_bug.cgi?id=1851335
[rhbz#1856397]: https://bugzilla.redhat.com/show_bug.cgi?id=1856397
[rhbz#1869399]: https://bugzilla.redhat.com/show_bug.cgi?id=1869399
[rhbz#1885841]: https://bugzilla.redhat.com/show_bug.cgi?id=1885841
[rhbz#1917286]: https://bugzilla.redhat.com/show_bug.cgi?id=1917286

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Stopping all nodes causes servers to migrate

2021-01-26 Thread Tomas Jelinek

Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):

On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais wrote:

Hi Digimer,

On Sun, 24 Jan 2021 15:31:22 -0500
Digimer  wrote:
[...]

  I had a test server (srv01-test) running on node 1 (el8-a01n01),
and on
node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.

   It appears like pacemaker asked the VM to migrate to node 2
instead of
stopping it. Once the server was on node 2, I couldn't use 'pcs
resource
disable ' as it returned that that resource was unmanaged, and
the
cluster shut down was hung. When I directly stopped the VM and then
did
a 'pcs resource cleanup', the cluster shutdown completed.


As actions during a cluster shutdown cannot be handled in the same
transition
for each nodes, I usually add a step to disable all resources using
property
"stop-all-resources" before shutting down the cluster:

   pcs property set stop-all-resources=true
   pcs cluster stop --all

But it seems there's a very new cluster property to handle that
(IIRC, one or
two releases ago). Look at "shutdown-lock" doc:

   [...]
   some users prefer to make resources highly available only for
failures, with
   no recovery for clean shutdowns. If this option is true, resources
active on a
   node when it is cleanly shut down are kept "locked" to that node
(not allowed
   to run elsewhere) until they start again on that node after it
rejoins (or
   for at most shutdown-lock-limit, if set).
   [...]

[...]

   So as best as I can tell, pacemaker really did ask for a
migration. Is
this the case?


AFAIK, yes, because each cluster shutdown request is handled
independently at
node level. There's a large door open for all kind of race conditions
if
requests are handled with some random lags on each nodes.


I'm going to guess that's what happened.

The basic issue is that there is no "cluster shutdown" in Pacemaker,
only "node shutdown". I'm guessing "pcs cluster stop --all" sends
shutdown requests for each node in sequence (probably via systemd), and
if the nodes are quick enough, one could start migrating off resources
before all the others get their shutdown request.


Pcs is doing its best to stop nodes in parallel. The first 
implementation of this was done back in 2015:

https://bugzilla.redhat.com/show_bug.cgi?id=1180506
Since then, we moved to using curl for network communication, which also 
handles parallel cluster stop. Obviously, this doesn't ensure the stop 
command arrives to and is processed on all nodes at the exactly same time.


Basically, pcs sends 'stop pacemaker' request to all nodes in parallel 
and waits for it to finish on all nodes. Then it sends 'stop corosync' 
request to all nodes in parallel. The actual stopping on each node is 
done by 'systemctl stop'.


Yes, the nodes which get the request sooner may start migrating resources.

Regards,
Tomas



There would be a way around it. Normally Pacemaker is shut down via
SIGTERM to pacemakerd (which is what systemctl stop does), but inside
Pacemaker it's implemented as a special "shutdown" transient node
attribute, set to the epoch timestamp of the request. It would be
possible to set that attribute for all nodes in a copy of the CIB, then
load that into the live cluster.

stop-all-resources as suggested would be another way around it (and
would have to be cleared after start-up, which could be a plus or a
minus depending on how much control vs convenience you want).



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Stopping a server failed and fenced, despite disabling stop timeout

2021-01-19 Thread Tomas Jelinek

Dne 18. 01. 21 v 20:08 Digimer napsal(a):

On 2021-01-18 4:49 a.m., Tomas Jelinek wrote:

Hi Digimer,

Regarding pcs behavior:

When deleting a resource, pcs first sets its target-role to Stopped,
pushes the change into pacemaker and waits for the resource to stop.
Once the resource stops, pcs removes the resource from CIB. If pcs
simply removed the resource from CIB without stopping it first, the
resource would be running as orphaned (until pacemaker stops it if
configured to do so). We want to avoid that.

If the resource cannot be stopped for whatever reason, pcs reports this
and advises running the delete command with --force. Running 'pcs
resource delete --force' skips the part where pcs sets target role and
waits for the resource to stop, making pcs simply remove the resource
from CIB.

I agree that pcs should handle deleting unmanaged resources in a better
way. We plan to address that, but it's not on top of the priority list.
Our plan is actually to prevent deleting unmanaged resources (or require
--force to be specified to do so) based on the following scenario:

If a resource is deleted while in unmanaged state, it ends up in
ORPHANED state - it is removed from CIB but still present in running
configuration. This can cause various issues, i.e. when unmanaged
resource is stopped manually outside of the cluster there might be
problems with stopping the resource upon deletion (while unmanaged)
which may end up with stonith being initiated - this is not desired.


Regards,
Tomas


This logic makes sense. If I may propose a reason for an alternative method;

In my case, the idea I was experimenting with was to remove a running
server from cluster management, without actually shutting down the
server. This is somewhat contrived, I freely admin, but the idea of
taking a server out of the config entirely without shutting it down
could be useful in some cases.

In my case, I didn't worry about the orphaned state and the risk of it
trying to start elsewhere as there are additional safeguards in place to
prevent this (both in our software and in that DRBD is not set to
dual-primary, so the VM simply can't start elsewhere while it's running
somewhere).

Totally understand it's not a priority, but when this is addressed, some
special mechanism to say "I know this will leave it orphaned and that's
OK" would be nice to have.


You can do it even now with "pcs resource delete --force". I admit it's 
not the best way and an extra flag (--dont-stop or similar) would be 
better. I wrote the idea into our notes so it doesn't get forgotten.


Tomas



digimer


Dne 18. 01. 21 v 3:11 Digimer napsal(a):

Hi all,

    Mind the slew of questions, well into testing now and finding lots of
issues. This one is two questions... :)

    I set a server to be unamaged in pacemaker while the server was
running. Then I tried to remove the resource, and it refused saying it
couldn't stop it, and to use '--force'. So I did, and the node got
fenced. Now, the resource was setup with;

pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \
   meta allow-migrate="true" target-role="started" \
   op monitor interval="60" start timeout="INFINITY" \
   on-fail="block" stop timeout="INFINITY" on-fail="block" \
   migrate_to timeout="INFINITY"

    I would have expected the 'stop timeout="INFINITY" on-fail="block"' to
prevent fencing if the server failed to stop (question 1) and that if a
resource was unmanaged, that the resource wouldn't even try to stop
(question 2).

    Can someone help me understand what happened here?

digimer

More below;


[root@el8-a01n01 ~]# pcs resource remove srv01-test
Attempting to stop: srv01-test... Warning: 'srv01-test' is unmanaged
Error: Unable to stop: srv01-test before deleting (re-run with --force
to force deletion)
[root@el8-a01n01 ~]# pcs resource remove srv01-test --force
Deleting Resource - srv01-test
[root@el8-a01n01 ~]# client_loop: send disconnect: Broken pipe


    As you can see, the node was fenced. The logs on that node were;


Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]:  warning:
srv01-test_stop_0 process (PID 113779) timed out
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]:  warning:
srv01-test_stop_0[113779] timed out after 2ms
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]:  error:
Result of stop operation for srv01-test on el8-a01n01: Timed Out
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]:  notice:
el8-a01n01-srv01-test_stop_0:37 [ The server: [srv01-test] is indeed
running. It will be shut down now.\n ]
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting fail-count-srv01-test#stop_0[el8-a01n01]: (unset) -> INFINITY
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting last-failure-srv01-test#stop_0[el8-a01n01]:

Re: [ClusterLabs] Stopping a server failed and fenced, despite disabling stop timeout

2021-01-18 Thread Tomas Jelinek

Hi Digimer,

Regarding pcs behavior:

When deleting a resource, pcs first sets its target-role to Stopped, 
pushes the change into pacemaker and waits for the resource to stop. 
Once the resource stops, pcs removes the resource from CIB. If pcs 
simply removed the resource from CIB without stopping it first, the 
resource would be running as orphaned (until pacemaker stops it if 
configured to do so). We want to avoid that.


If the resource cannot be stopped for whatever reason, pcs reports this 
and advises running the delete command with --force. Running 'pcs 
resource delete --force' skips the part where pcs sets target role and 
waits for the resource to stop, making pcs simply remove the resource 
from CIB.


I agree that pcs should handle deleting unmanaged resources in a better 
way. We plan to address that, but it's not on top of the priority list. 
Our plan is actually to prevent deleting unmanaged resources (or require 
--force to be specified to do so) based on the following scenario:


If a resource is deleted while in unmanaged state, it ends up in 
ORPHANED state - it is removed from CIB but still present in running 
configuration. This can cause various issues, i.e. when unmanaged 
resource is stopped manually outside of the cluster there might be 
problems with stopping the resource upon deletion (while unmanaged) 
which may end up with stonith being initiated - this is not desired.



Regards,
Tomas


Dne 18. 01. 21 v 3:11 Digimer napsal(a):

Hi all,

   Mind the slew of questions, well into testing now and finding lots of
issues. This one is two questions... :)

   I set a server to be unamaged in pacemaker while the server was
running. Then I tried to remove the resource, and it refused saying it
couldn't stop it, and to use '--force'. So I did, and the node got
fenced. Now, the resource was setup with;

pcs resource create srv07-el6 ocf:alteeve:server name="srv07-el6" \
  meta allow-migrate="true" target-role="started" \
  op monitor interval="60" start timeout="INFINITY" \
  on-fail="block" stop timeout="INFINITY" on-fail="block" \
  migrate_to timeout="INFINITY"

   I would have expected the 'stop timeout="INFINITY" on-fail="block"' to
prevent fencing if the server failed to stop (question 1) and that if a
resource was unmanaged, that the resource wouldn't even try to stop
(question 2).

   Can someone help me understand what happened here?

digimer

More below;


[root@el8-a01n01 ~]# pcs resource remove srv01-test
Attempting to stop: srv01-test... Warning: 'srv01-test' is unmanaged
Error: Unable to stop: srv01-test before deleting (re-run with --force
to force deletion)
[root@el8-a01n01 ~]# pcs resource remove srv01-test --force
Deleting Resource - srv01-test
[root@el8-a01n01 ~]# client_loop: send disconnect: Broken pipe


   As you can see, the node was fenced. The logs on that node were;


Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]:  warning:
srv01-test_stop_0 process (PID 113779) timed out
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-execd[1872]:  warning:
srv01-test_stop_0[113779] timed out after 2ms
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]:  error:
Result of stop operation for srv01-test on el8-a01n01: Timed Out
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-controld[1875]:  notice:
el8-a01n01-srv01-test_stop_0:37 [ The server: [srv01-test] is indeed
running. It will be shut down now.\n ]
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting fail-count-srv01-test#stop_0[el8-a01n01]: (unset) -> INFINITY
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting last-failure-srv01-test#stop_0[el8-a01n01]: (unset) -> 1610935435
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting fail-count-srv01-test#stop_0[el8-a01n01]: INFINITY -> (unset)
Jan 18 02:03:55 el8-a01n01.alteeve.ca pacemaker-attrd[1873]:  notice:
Setting last-failure-srv01-test#stop_0[el8-a01n01]: 1610935435 -> (unset)
client_loop: send disconnect: Broken pipe


On the peer node, the logs showed;


Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
notice: State transition S_IDLE -> S_POLICY_ENGINE
Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]:
notice: Calculated transition 58, saving inputs in
/var/lib/pacemaker/pengine/pe-input-100.bz2
Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
notice: Transition 58 (Complete=0, Pending=0, Fired=0, Skipped=0,
Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-100.bz2): Complete
Jan 18 02:03:13 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
notice: State transition S_TRANSITION_ENGINE -> S_IDLE
Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-controld[490050]:
notice: State transition S_IDLE -> S_POLICY_ENGINE
Jan 18 02:03:18 el8-a01n02.alteeve.ca pacemaker-schedulerd[490049]:
notice: Calculated transition 59, saving inputs in
/var/lib/pacemaker/pengine/pe-input-101.bz2
Jan 18 

Re: [ClusterLabs] Q: Starting from pcs-0.10.6-4.el8, logs of pcsd are now output to syslog

2021-01-08 Thread Tomas Jelinek

Hi,

Thank you for reporting the issue and also for testing the fix so 
quickly and confirming it works.


Regards,
Tomas


Dne 08. 01. 21 v 5:39 井上和徳 napsal(a):

Hi,

Thanks for your reply.
I've backported ddb0d3f to pcs-0.10.6-4.el8 and confirmed that this
bug is fixed.

Thanks,
Kazunori INOUE

On Fri, Jan 8, 2021 at 1:49 AM Tomas Jelinek  wrote:


Hi,

It took us some time to figure this out, sorry about that.

The behavior you see is not intended, it is a bug. The bug originates in
commit 966959ac54d80c4cdeeb0fac40dc7ea60c1a0a82, more specifically in
this line in pcs/run.py:
from pcs.app import main as cli

The pcs/app.py file is responsible for starting pcs CLI. The pcs/run.py
file is responsible for starting pcs daemon, pcs CLI and pcs SNMP agent
in a unified way and it serves as an entry point. Importing pcs.app
caused logging.basicConfig(), which is called in pcs/app.py, to be
executed when starting pcs daemon and pcs SNMP agent. This
unintentionally configured loggers in pcs daemon and pcs SNMP agent to
log into stderr and those logs then got propagated to system log and
/var/log/messages.

To fix the bug, logging.basicConfig() should be removed from pcs/app.py,
the line is actually not needed at all. The fix is available upstream in
commit ddb0d3fed3273181356cd638d724b891ecd78263.


Regards,
Tomas


Dne 23. 12. 20 v 5:25 井上和徳 napsal(a):

Hi!

Is it a specification that pcsd (pcs-0.10.6-4.el8) outputs logs to syslog?
If it is a specification, which change/commit is it due to?

# rpm -q pcs
pcs-0.10.6-4.el8.x86_64
#
# cat /var/log/pcsd/pcsd.log
I, [2020-12-23T13:17:36.748 #00010] INFO -- : Starting Daemons
I, [2020-12-23T13:17:36.749 #00010] INFO -- : Running:
/usr/sbin/pcs cluster start
I, [2020-12-23T13:17:36.749 #00010] INFO -- : CIB USER: hacluster, groups:
I, [2020-12-23T13:17:39.749 #00010] INFO -- : Return Value: 0
I, [2020-12-23T13:17:39.749 #0] INFO -- : 200 GET
/remote/cluster_start (192.168.122.247) 3243.41ms
I, [2020-12-23T13:18:47.049 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 4.05ms
I, [2020-12-23T13:18:47.060 #00011] INFO -- : Config files sync started
I, [2020-12-23T13:18:47.061 #00011] INFO -- : SRWT Node: rhel83-2
Request: get_configs
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Connecting to:
https://192.168.122.140:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:47.061 #00011] INFO -- : SRWT Node: rhel83-1
Request: get_configs
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Connecting to:
https://192.168.122.247:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Config files sync finished
#
# grep pcs /var/log/messages
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Starting Daemons
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Running: /usr/sbin/pcs cluster start
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:CIB USER: hacluster, groups:
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Return Value: 0
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:tornado.access:200 GET /remote/cluster_start
(192.168.122.247) 3243.41ms
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:tornado.access:200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 4.05ms
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Config files sync started
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:SRWT Node: rhel83-2 Request: get_configs
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Connecting to:
https://192.168.122.140:2224/remote/get_configs?cluster_name=my_cluster
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:SRWT Node: rhel83-1 Request: get_configs
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Connecting to:
https://192.168.122.247:2224/remote/get_configs?cluster_name=my_cluster
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Config files sync finished
#

Up to pcs-0.10.4-6.el8, there was no output to syslog.

# rpm -q pcs
pcs-0.10.4-6.el8.x86_64
#
# cat /var/log/pcsd/pcsd.log
I, [2020-12-23T13:17:36.059 #01200] INFO -- : Starting Daemons
I, [2020-12-23T13:17:36.060 #01200] INFO -- : Running:
/usr/sbin/pcs cluster start
I, [2020-12-23T13:17:36.060 #01200] INFO -- : CIB USER: hacluster, groups:
I, [2020-12-23T13:17:38.060 #01200] INFO -- : Return Value: 0
I, [2020-12-23T13:17:38.060 #0] INFO -- : 200 GET
/remote/cluster_start (192.168.122.247) 1650.38ms
I, [2020-12-23T13:18:46.991 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 5.06ms
I, [2020-12-23T13:18:58.053 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.247) 5.74ms
I, [2020-12-23T13:18:58.064 #01201] INFO -- : Config files sync started
I, [2020-12-23T13:18:58.064 #01201] INFO -- : SRWT Node: rhel83-2
Request: get_configs
I, [2020-12-23T13:18:58.064 #01201] INFO -- : Connecting to:
https

Re: [ClusterLabs] Q: Starting from pcs-0.10.6-4.el8, logs of pcsd are now output to syslog

2021-01-07 Thread Tomas Jelinek

Hi,

It took us some time to figure this out, sorry about that.

The behavior you see is not intended, it is a bug. The bug originates in 
commit 966959ac54d80c4cdeeb0fac40dc7ea60c1a0a82, more specifically in 
this line in pcs/run.py:

from pcs.app import main as cli

The pcs/app.py file is responsible for starting pcs CLI. The pcs/run.py 
file is responsible for starting pcs daemon, pcs CLI and pcs SNMP agent 
in a unified way and it serves as an entry point. Importing pcs.app 
caused logging.basicConfig(), which is called in pcs/app.py, to be 
executed when starting pcs daemon and pcs SNMP agent. This 
unintentionally configured loggers in pcs daemon and pcs SNMP agent to 
log into stderr and those logs then got propagated to system log and 
/var/log/messages.


To fix the bug, logging.basicConfig() should be removed from pcs/app.py, 
the line is actually not needed at all. The fix is available upstream in 
commit ddb0d3fed3273181356cd638d724b891ecd78263.



Regards,
Tomas


Dne 23. 12. 20 v 5:25 井上和徳 napsal(a):

Hi!

Is it a specification that pcsd (pcs-0.10.6-4.el8) outputs logs to syslog?
If it is a specification, which change/commit is it due to?

# rpm -q pcs
pcs-0.10.6-4.el8.x86_64
#
# cat /var/log/pcsd/pcsd.log
I, [2020-12-23T13:17:36.748 #00010] INFO -- : Starting Daemons
I, [2020-12-23T13:17:36.749 #00010] INFO -- : Running:
/usr/sbin/pcs cluster start
I, [2020-12-23T13:17:36.749 #00010] INFO -- : CIB USER: hacluster, groups:
I, [2020-12-23T13:17:39.749 #00010] INFO -- : Return Value: 0
I, [2020-12-23T13:17:39.749 #0] INFO -- : 200 GET
/remote/cluster_start (192.168.122.247) 3243.41ms
I, [2020-12-23T13:18:47.049 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 4.05ms
I, [2020-12-23T13:18:47.060 #00011] INFO -- : Config files sync started
I, [2020-12-23T13:18:47.061 #00011] INFO -- : SRWT Node: rhel83-2
Request: get_configs
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Connecting to:
https://192.168.122.140:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:47.061 #00011] INFO -- : SRWT Node: rhel83-1
Request: get_configs
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Connecting to:
https://192.168.122.247:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:47.061 #00011] INFO -- : Config files sync finished
#
# grep pcs /var/log/messages
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Starting Daemons
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Running: /usr/sbin/pcs cluster start
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:CIB USER: hacluster, groups:
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Return Value: 0
Dec 23 13:17:39 rhel83-2 
pcsd[600350]:INFO:tornado.access:200 GET /remote/cluster_start
(192.168.122.247) 3243.41ms
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:tornado.access:200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 4.05ms
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Config files sync started
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:SRWT Node: rhel83-2 Request: get_configs
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Connecting to:
https://192.168.122.140:2224/remote/get_configs?cluster_name=my_cluster
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:SRWT Node: rhel83-1 Request: get_configs
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Connecting to:
https://192.168.122.247:2224/remote/get_configs?cluster_name=my_cluster
Dec 23 13:18:47 rhel83-2 
pcsd[600350]:INFO:pcs.daemon:Config files sync finished
#

Up to pcs-0.10.4-6.el8, there was no output to syslog.

# rpm -q pcs
pcs-0.10.4-6.el8.x86_64
#
# cat /var/log/pcsd/pcsd.log
I, [2020-12-23T13:17:36.059 #01200] INFO -- : Starting Daemons
I, [2020-12-23T13:17:36.060 #01200] INFO -- : Running:
/usr/sbin/pcs cluster start
I, [2020-12-23T13:17:36.060 #01200] INFO -- : CIB USER: hacluster, groups:
I, [2020-12-23T13:17:38.060 #01200] INFO -- : Return Value: 0
I, [2020-12-23T13:17:38.060 #0] INFO -- : 200 GET
/remote/cluster_start (192.168.122.247) 1650.38ms
I, [2020-12-23T13:18:46.991 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.140) 5.06ms
I, [2020-12-23T13:18:58.053 #0] INFO -- : 200 GET
/remote/get_configs?cluster_name=my_cluster (192.168.122.247) 5.74ms
I, [2020-12-23T13:18:58.064 #01201] INFO -- : Config files sync started
I, [2020-12-23T13:18:58.064 #01201] INFO -- : SRWT Node: rhel83-2
Request: get_configs
I, [2020-12-23T13:18:58.064 #01201] INFO -- : Connecting to:
https://192.168.122.140:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:58.064 #01201] INFO -- : SRWT Node: rhel83-1
Request: get_configs
I, [2020-12-23T13:18:58.064 #01201] INFO -- : Connecting to:
https://192.168.122.247:2224/remote/get_configs?cluster_name=my_cluster
I, [2020-12-23T13:18:58.065 #01201] INFO -- : 

Re: [ClusterLabs] Can't have 2 nodes as master with galera resource agent

2020-12-11 Thread Tomas Jelinek

Dne 11. 12. 20 v 15:10 Andrei Borzenkov napsal(a):

11.12.2020 16:13, Raphael Laguerre пишет:

Hello,

I'm trying to setup a 2 nodes cluster with 2 galera instances. I use the ocf:heartbeat:galera 
resource agent, however, after I create the resource, only one node appears to be in master role, 
the other one can't be promoted and stays in slave role. I expect to have both nodes with a mysqld 
instance running and synchronized in a galera cluster. Could you help me please ? When I do a 
debug-promote, it seems that mysqld is started on node-01 and shutdowned juste after, but I don't 
understand why. If I launch the galera cluster manually by doing on one node 
"galera_new_cluster" and on the second node "systemctl start mariadb", it works 
properly (I can't write on both nodes and they are synchronized)

Here is the scenario that led to the current situation:
I did :

pcs resource create r_galera ocf:heartbeat:galera enable_creation=true 
wsrep_cluster_address="gcomm://192.168.0.1,192.168.0.2" 
cluster_host_map="node-01:192.168.0.1;node-02:192.168.0.2" promotable meta master-max=2 
promoted-max=2



Try it like this:
pcs resource create r_galera ocf:heartbeat:galera enable_creation=true 
wsrep_cluster_address="gcomm://192.168.0.1,192.168.0.2" 
cluster_host_map="node-01:192.168.0.1;node-02:192.168.0.2" promotable 
master-max=2 promoted-max=2

i.e. drop "meta" after "promotable"

Options written after "meta" go to the primitive resource, options 
written after "promotable" (or "clone") go to the promotable (or clone) 
resource.




Promotable, promoted-max must be set on clone, not on primitive. From logs

   
 
...
   
 
 
   

Those are resource (primitive) attributes
...
 
 
   
 
   


And clone attributes are default (1 master) so pacemaker promotes only
one, the first, node.

Dec 11 11:35:23 node-02 pacemaker-schedulerd[5304] (color_promotable)
info: r_galera-clone: Promoted 1 instances of a possible 1 to master

Resource on second node correctly sets master score, but pacemaker
cannot promote more than one node.

Your pcs invocation lacks --master switch (and it in general looks
strange, I am not sure how you managed to create clone with this
command, but I am not familiar with pcs enough):

pcs resource create r_galera ocf:heartbeat:galera ... --master meta
master-max=2 promoted-max=2


I guess Raphael is using pcs-0.10.x which brings a new syntax. There is 
no --master in pcs-0.10.x.



Regards,
Tomas





and I got:


root@node-01:~# pcs status
Cluster name: cluster-ha-mariadb
Stack: corosync
Current DC: node-02 (version 2.0.1-9e909a5bdd) - partition with quorum
Last updated: Fri Dec 11 11:38:12 2020
Last change: Fri Dec 11 11:35:18 2020 by root via cibadmin on node-01

2 nodes configured
3 resources configured

Online: [ node-01 node-02 ]

Full list of resources:

r_vip (ocf::heartbeat:IPaddr2): Started node-01
Clone Set: r_galera-clone [r_galera] (promotable)
Masters: [ node-02 ]
Slaves: [ node-01 ]

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled


Please find attached the cib.xml, the pacemaker logs, the syslog logs and the 
mysql logs from the time of the creation of the resource for node-01 and 
node-02. There is no mysql logs generated after the resource creation on 
node-01.

Here are info about my environment and configuration (except for IP and 
hostname, both nodes are identical) :


root@node-01:~# cat /etc/debian_version
10.7

root@node-01:~# uname -a
Linux node-01 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
GNU/Linux

root@node-01:~# dpkg -l corosync pacemaker pcs pacemaker-cli-utils 
mariadb-server
Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder
| 
État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=attend-traitement-déclenchements
|/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais)
||/ Nom Version Architecture Description
+++-===-===--=
ii corosync 3.0.1-2+deb10u1 amd64 cluster engine daemon and utilities
ii mariadb-server 1:10.3.27-0+deb10u1 all MariaDB database server (metapackage 
depending on the latest version)
ii pacemaker 2.0.1-5+deb10u1 amd64 cluster resource manager
ii pacemaker-cli-utils 2.0.1-5+deb10u1 amd64 cluster resource manager command 
line utilities
ii pcs 0.10.1-2 all Pacemaker Configuration System

root@node-01:~# cat /etc/mysql/mariadb.conf.d/50-galera.cnf
[galera]

wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_cluster_address = gcomm://192.168.0.1,192.168.0.2
#wsrep_cluster_address = 

Re: [ClusterLabs] Unable to connect to node , no token available

2020-11-06 Thread Tomas Jelinek

Hi Raffaele,

Several bugs related to node authentication have been fixed in between 
Debian 9 and Debian 10 version of pcs. I think you were hitting one of 
those.


Regards,
Tomas


Dne 02. 11. 20 v 11:06 Raffaele Pantaleoni napsal(a):

Hi,

I have solved the issue, well...not really though.

I "simply" migrated the VMs from Debian 9 to Debian 10 and updated the 
Pacemaker stack to the latest versions available in Debian Buster.


The problems simply disappeared and everything is working as expected 
now, out of the box.



On 28/10/20 10:17, Raffaele Pantaleoni wrote:

Hello everybody,

I'm trying to add an existing cluster into the pcsd web gui.

When I try to add it, I select a node (SRVDRSW01.seko.com) and right 
after I am requested to enter the password for the hacluster user but 
from the pcsd log file I can see this:


I, [2020-10-28T09:58:26.213638 #12661]  INFO -- : Running: id -Gn 
hacluster
I, [2020-10-28T09:58:26.214091 #12661]  INFO -- : CIB USER: hacluster, 
groups:

I, [2020-10-28T09:58:26.222338 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.222730 #12661]  INFO -- : Running: id -Gn 
hacluster
I, [2020-10-28T09:58:26.222887 #12661]  INFO -- : CIB USER: hacluster, 
groups:

I, [2020-10-28T09:58:26.230203 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.230424 #12661]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2020-10-28T09:58:26.230587 #12661]  INFO -- : CIB USER: hacluster, 
groups:

I, [2020-10-28T09:58:26.251826 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.253144 #12661]  INFO -- : SRWT Node: 
SRVDRSW01.seko.com Request: check_auth
E, [2020-10-28T09:58:26.253603 #12661] ERROR -- : Unable to connect to 
node SRVDRSW01.seko.com, no token available


I tried it using the node name only (SRVDRSW01) or the IP address. The 
error message doesn't change.


Since this is a test installation inside our LAN there is no firewall 
configured at all.


This is the actual status of the cluster:
root@SRVDRSW01:~# pcs status
Cluster name: debian
Stack: corosync
Current DC: SRVDRSW01 (version 1.1.16-94ff4df) - partition with quorum
Last updated: Wed Oct 28 09:55:20 2020
Last change: Mon Oct 19 17:07:33 2020 by root via crm_attribute on 
SRVDRSW01


6 nodes configured
5 resources configured

Online: [ SRVDRSW01 SRVDRSW02 SRVDRSW03 SRVDRSW04 SRVDRSW05 SRVDRSW06 ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started SRVDRSW01
 CouchIP    (ocf::heartbeat:IPaddr2):   Started SRVDRSW03
 NodeJSIP   (ocf::heartbeat:IPaddr2):   Started SRVDRSW04
 FrontEnd   (ocf::heartbeat:nginx): Started SRVDRSW01
 ITATESTSERVER-DIP  (ocf::nodejs:pm2):  Started SRVDRSW04

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/disabled


I went through hundreds of tries with no luck.
Is it something I am doing wrong or I forgot to do?
I tried to google it but no solution was good.
I have no ideas left.

Thank you in advance.
---
Raffaele Pantaleoni

/
/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

--
Signature

*Raffaele Pantaleoni*

*IOT Design R Unit*

Tel. +39 0746 605885

Fax.    +39 0746 607072

Mail *r.pantale...@seko.com *

Skype r.pantaleoni.seko

*www.seko.com*  **

/This e-mail (including attachments) is intended only for the 
recipient(s) named above. It may contain confidential or privileged 
information. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. E-mail transmission cannot be 
guaranteed to be secure or error-free as information could be 
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
contain viruses. The sender therefore does not accept liability for any 
errors or omissions in the contents of this message, which arise as a 
result of e-mail transmission. If verification is required please 
request a hard-copy version. Your personal data will be processed in 
compliance with the EU General Data Protection Regulation n. 2016/679 
("GDPR"), applicable since May 25th, 2018. For further information, 
please see our Privacy Policy /



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Unable to connect to node , no token available

2020-10-29 Thread Tomas Jelinek

Hi Raffaele,

We'll need to know more details:
* What is your pcs version?
* Which node is used to access web UI?
* Which node is the log from?

The part of pcsd log you provided doesn't seem to cover the issue you 
are experiencing. If the node is not authenticated, which seems to be 
the case as pcsd is asking for a password, then the log simply states 
the node is not authenticated.


We would need a bigger portion of the log to see what's happening when 
you are trying to add the cluster to the web UI.



Thanks,
Tomas


Dne 29. 10. 20 v 8:02 Raffaele Pantaleoni napsal(a):

The node is already part of that cluster indeed. That's why I clicked on

"+Add Existing"

the pop-up form says:

"Enter the hostname/IP of a node in a cluster that you would like to 
manage:"


But I can't add it into the pcsd web gui.

That is the cluster list is empty.

On 28/10/20 21:59, Strahil Nikolov wrote:
Are you sure that the node you want to join is not one of the listed 
here:


Online: [ SRVDRSW01 SRVDRSW02 SRVDRSW03 SRVDRSW04 SRVDRSW05 SRVDRSW06 ]

The hostname looks pretty much the same.

Best Regards,
Strahil Nikolov






В сряда, 28 октомври 2020 г., 11:18:11 Гринуич+2, Raffaele Pantaleoni 
 написа:






Hello everybody,

I'm trying to add an existing cluster into the pcsd web gui.

When I try to add it, I select a node (SRVDRSW01.seko.com) and right
after I am requested to enter the password for the hacluster user but
from the pcsd log file I can see this:

I, [2020-10-28T09:58:26.213638 #12661]  INFO -- : Running: id -Gn 
hacluster

I, [2020-10-28T09:58:26.214091 #12661]  INFO -- : CIB USER: hacluster,
groups:
I, [2020-10-28T09:58:26.222338 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.222730 #12661]  INFO -- : Running: id -Gn 
hacluster

I, [2020-10-28T09:58:26.222887 #12661]  INFO -- : CIB USER: hacluster,
groups:
I, [2020-10-28T09:58:26.230203 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.230424 #12661]  INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2020-10-28T09:58:26.230587 #12661]  INFO -- : CIB USER: hacluster,
groups:
I, [2020-10-28T09:58:26.251826 #12661]  INFO -- : Return Value: 0
I, [2020-10-28T09:58:26.253144 #12661]  INFO -- : SRWT Node:
SRVDRSW01.seko.com Request: check_auth
E, [2020-10-28T09:58:26.253603 #12661] ERROR -- : Unable to connect to
node SRVDRSW01.seko.com, no token available

I tried it using the node name only (SRVDRSW01) or the IP address. The
error message doesn't change.

Since this is a test installation inside our LAN there is no firewall
configured at all.

This is the actual status of the cluster:
root@SRVDRSW01:~# pcs status
Cluster name: debian
Stack: corosync
Current DC: SRVDRSW01 (version 1.1.16-94ff4df) - partition with quorum
Last updated: Wed Oct 28 09:55:20 2020
Last change: Mon Oct 19 17:07:33 2020 by root via crm_attribute on 
SRVDRSW01


6 nodes configured
5 resources configured

Online: [ SRVDRSW01 SRVDRSW02 SRVDRSW03 SRVDRSW04 SRVDRSW05 SRVDRSW06 ]

Full list of resources:

  ClusterIP  (ocf::heartbeat:IPaddr2):   Started SRVDRSW01
  CouchIP    (ocf::heartbeat:IPaddr2):   Started SRVDRSW03
  NodeJSIP   (ocf::heartbeat:IPaddr2):   Started SRVDRSW04
  FrontEnd   (ocf::heartbeat:nginx): Started SRVDRSW01
  ITATESTSERVER-DIP  (ocf::nodejs:pm2):  Started SRVDRSW04

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/disabled


I went through hundreds of tries with no luck.
Is it something I am doing wrong or I forgot to do?
I tried to google it but no solution was good.
I have no ideas left.

Thank you in advance.
---
Raffaele Pantaleoni

/
/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Adding a node to an active cluster

2020-10-29 Thread Tomas Jelinek

Hi Jiagi,

Yes, 'pcs cluster node add' can add a node to a cluster which is already 
running resources.


Regards,
Tomas


Dne 28. 10. 20 v 22:09 Strahil Nikolov napsal(a):

pcsd is another layer ontop of the crmd and thus , you need to use the pcs 
commands to manage the cluster (otherwise the outer layer will be out of sync 
with inner layers).

pcs cluster auth is designed to be used to add a node to a cluster.If you are 
worried of the result - put the cluster in global maintenance before that.
You got 2 steps :
- pcs cluster auth -> allows the pcsd on the new node to communicate with pcsd 
daemons on the other members of the cluster
- pcs cluster node add -> adds the node to the cluster


Best Regards,
Strahil Nikolov






В сряда, 28 октомври 2020 г., 22:04:25 Гринуич+2, Jiaqi Tian1 
 написа:





Hi,

Is "pcs cluster auth" command capable of add a host when the cluster already 
has resources running? Also, is there a crmsh version of this command? I didn't find this 
for crmsh.

  


Thanks,

Jiaqi

  


- Original message -
From: Strahil Nikolov 
Sent by: "Users" 
To: "users@clusterlabs.org" 
Cc:
Subject: [EXTERNAL] Re: [ClusterLabs] Adding a node to an active cluster
Date: Tue, Oct 27, 2020 1:07 PM
   
On RHEL, I would use "pcs cluster auth"/"pcs host auth" && "pcs cluster node add".For cluster nodes auth , you can check : https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/high-availability-and-clusters_considerations-in-adopting-rhel-8#new_commands_for_authenticating_nodes_in_a_cluster Best Regards,Strahil NikolovВ вторник, 27 октомври 2020 г., 18:06:06 Гринуич+2, Jiaqi Tian1  написа:Hi Xin,Thank you. The crmsh version is 4.1.0.0, OS is RHEL 8.0. I have tried crm cluster init -y, but it seems it cannot be run when the cluster is already up and running with resources running on it. is crm cluster join command used for this situation?  Thanks,Jiaqi > - Original message -> From: Xin Liang > Sent by: "Users" > To: "users@clusterlabs.org" > Cc:>> Subject: [EXTERNAL] Re: [ClusterLabs] Adding a node to an active cluster> Date: Tue, Oct 27, 2020 3:29 AM>  >> Hi Jiaqi,>>  >> Which OS version do you use and which crmsh version do you use?> I highly recommend you update your crmsh to latest version.>> Besides that, did you mean you already have ceha03 and ceha04 nodes running the cluster service?>> From ha-cluster-bootstrapceha03.log, I can't see you have record of init cluster successfully.>>  >> Ideally, you should run:>> 1. on ceha03:    crm cluster init -y> 2. on ceha01:   crm cluster join -c ceha03 -y>>  > > From: Users  on behalf of Jiaqi Tian1 > Sent: Tuesday, October 27, 2020 12:15 AM> To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Adding a node to an active cluster>  >> Hi Xin,>> I have ceha03 and ceha04 in cluster and trying to add ceha01 to the cluster. Running crm cluster join -c ceha03 -y on ceha01. Here are logs in ceha03 and ceha01. the log file in ceha04 is empty.>>  >> Thanks,>> Jiaqi>>  >>> - Original message ->> From: Xin Liang >> Sent by: "Users" >> To: "users@clusterlabs.org" >> Cc:>> Subject: [EXTERNAL] Re: [ClusterLabs] Adding a node to an active cluster>> Date: Mon, Oct 26, 2020 8:43 AM>>  >> Hi Jiaqi   Could you give me your "/var/log/crmsh/ha-cluster-bootstrap.log" or "/var/log/ha-cluster-bootstrap.log" on these 3 nodes?   Thanks  >> >> From: Users  on behalf of Jiaqi Tian1 >> Sent: Saturday, October 24, 2020 5:48 AM>> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Adding a node to an active cluster>>   Hi, Thank you for your suggestion. The case I have is, I have host1 and host2 in cluster that has resources running, then I try to join host3 to the cluster by running "crm cluster join -c host1 -y". But I get this "Configuring csync2...ERROR: cluster.join: Can't invoke crm cluster init init csync2_remote on host3" issue. Are there any other requirements for running this command?   Thanks   Jiaqi Tian  > - Original message ->>> From: Xin Liang >>> Sent by: "Users" >>> To: Cluster Labs - All topics related to open-source clustering welcomed >>> Cc:>>> Subject: [EXTERNAL] Re: [ClusterLabs] Adding a node to an active cluster>>> Date: Wed, Oct 21, 2020 9:44 PM>>>  >>> Hi Jiaqi,>>  >> Assuming you already have node1 running resources, you can try to run this command on node2:>>  >> "crm cluster join -c node1 -y">>  >>> >>> From: Users  on behalf of Jiaqi Tian1 >>> Sent: Wednesday, October 21, 2020 10:03 PM>>> To: users@clusterlabs.org >>> Subject: [ClusterLabs] Adding a node to an active cluster>>>  >> Hi,>> I'm trying to add a new node into an active pacemaker cluster with resources up and running.>> After steps:>> 1. update corosync.conf files among all hosts in cluster including the 

Re: [ClusterLabs] Adding a node to an active cluster

2020-10-22 Thread Tomas Jelinek

Hi,

Have you reloaded corosync configuration after modifying corosync.conf? 
I don't see it in your procedure. Modifying the config file has no 
effect until you reload it with 'corosync-cfgtool -R'.


It should not be needed to modify CIB and add the new node manually. If 
everything is done correctly and the new node joins corosync membership, 
pacemaker should figure out there is a new node in the cluster by itself.


Anyway, using pcs or crmsh is highly recommended as others already said.

Regards,
Tomas


Dne 21. 10. 20 v 16:03 Jiaqi Tian1 napsal(a):

Hi,
I'm trying to add a new node into an active pacemaker cluster with 
resources up and running.

After steps:
1. update corosync.conf files among all hosts in cluster including the 
new node

2. copy corosync auth file to the new node
3. enable corosync and pacemaker on the new node
4. adding the new node to the list of node in /var/lib/pacemaker/cib/cib.xml
Then I run crm status, the new node is displayed as offline. It will not 
become online, unless we run restart corosync and pacemaker on all nodes 
in cluster. But this is not what we want, since we want to keep existing 
nodes and resources up and running. Also in this case crm_node -l 
doesn't list the new node.

So my question is:
1. Is there another approach to have the existing nodes aware of the new 
node and have crm status indicates the node is online while keeping 
other nodes and resources up and running?

2. which config file crm_node command reads?
Thanks,
Jiaqi Tian


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.7 released

2020-10-02 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.7.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.7.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.7.zip

Complete change log for this release:
## [0.10.7] - 2020-09-30

### Added
- Support for multiple sets of resource and operation defaults,
  including support for rules ([rhbz#1222691], [rhbz#1817547],
  [rhbz#1862966], [rhbz#1867516], [rhbz#1869399])
- Support for "demote" value of resource operation's "on-fail" option
  ([rhbz#1843079])
- Support for 'number' type in rules ([rhbz#1869399])
- It is possible to set custom (promotable) clone id in `pcs resource
  create` and `pcs resource clone/promotable` commands ([rhbz#1741056])

### Fixed
- Prevent removing non-empty tag by removing tagged resource group or
  clone ([rhbz#1857295])
- Clarify documentation for 'resource move' and 'resource ban' commands
  with regards to the 'lifetime' option.
- Allow moving both promoted and demoted promotable clone resources
  ([rhbz#1875301])
- Improved error message with a hint in `pcs cluster cib-push`
  ([ghissue#241])

### Deprecated
- `pcs resource [op] defaults =...` commands are deprecated
  now. Use `pcs resource [op] defaults update =...` if you
  only manage one set of defaults, or `pcs resource [op] defaults set`
  if you manage several sets of defaults. ([rhbz#1817547])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Miroslav Lisik, Ondrej Mular, Reid Wahl and Tomas
Jelinek.

Cheers,
Tomas


[ghissue#241]: https://github.com/ClusterLabs/pcs/issues/241
[rhbz#1222691]: https://bugzilla.redhat.com/show_bug.cgi?id=1222691
[rhbz#1741056]: https://bugzilla.redhat.com/show_bug.cgi?id=1741056
[rhbz#1817547]: https://bugzilla.redhat.com/show_bug.cgi?id=1817547
[rhbz#1843079]: https://bugzilla.redhat.com/show_bug.cgi?id=1843079
[rhbz#1857295]: https://bugzilla.redhat.com/show_bug.cgi?id=1857295
[rhbz#1862966]: https://bugzilla.redhat.com/show_bug.cgi?id=1862966
[rhbz#1867516]: https://bugzilla.redhat.com/show_bug.cgi?id=1867516
[rhbz#1869399]: https://bugzilla.redhat.com/show_bug.cgi?id=1869399
[rhbz#1875301]: https://bugzilla.redhat.com/show_bug.cgi?id=1875301

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Re: Format of '--lifetime' in 'pcs resource move'

2020-08-25 Thread Tomas Jelinek

Hi all,

The lifetime value is indeed expected to be ISO 8601 duration. I updated 
pcs documentation to clarify that:

https://github.com/ClusterLabs/pcs/commit/1e9650a8fd5b8a0a22911ddca1010de582684971

Please note constraints are not removed from CIB when their lifetime 
expires. They are rendered ineffective but still preserved in CIB. See 
the following bugzilla for more details:

https://bugzilla.redhat.com/show_bug.cgi?id=1442116

Regards,
Tomas


Dne 21. 08. 20 v 7:56 Ulrich Windl napsal(a):

Strahil Nikolov  schrieb am 20.08.2020 um 18:25 in

Nachricht <329b5d02-2bcb-4a2c-bc2b-ca3030e6a...@yahoo.com>:

Have you tried ISO 8601 format.
For example: 'PT20M'


And watch out not to mix Minutes wth Months ;-)



The  ISo format  is described at:
https://manpages.debian.org/testing/crmsh/crm.8.en.html

Best Regards,
Strahil Nikolov

На 20 август 2020 г. 13:40:16 GMT+03:00, Digimer  написа:

Hi all,

  Reading the pcs man page for the 'move' action, it talks about
'--lifetime' switch that appears to control when the location
constraint
is removed;


   move [destination  node]  [--master]  [life‐
   time=] [--wait[=n]]
  Move the resource off the node it is  currently  running
  on  by  creating  a -INFINITY location constraint to ban
  the node. If destination node is specified the  resource
  will be moved to that node by creating an INFINITY loca‐
  tion constraint  to  prefer  the  destination  node.  If
  --master  is used the scope of the command is limited to
  the master role and you must use the promotable clone id
  (instead  of  the resource id). If lifetime is specified
  then the constraint will expire after that time,  other‐
  wise  it  defaults to infinity and the constraint can be
  cleared manually with 'pcs resource clear' or 'pcs  con‐
  straint  delete'.  If --wait is specified, pcs will wait
  up to 'n' seconds for the  resource  to  move  and  then
  return  0 on success or 1 on error. If 'n' is not speci‐
  fied it defaults to 60 minutes. If you want the resource
  to preferably avoid running on some nodes but be able to
  failover to them use 'pcs constraint location avoids'.


I think I want to use this, as we move resources manually for various
reasons where the old host is still able to host the resource should a
node failure occur. So we'd love to immediately remove the location
constraint as soon as the move completes.

I tries using '--lifetime=60' as a test, assuming the format was
'seconds', but that was invalid. How is this switch meant to be used?

Cheers

--
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay
Gould
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to specify which IP pcs should use?

2020-08-11 Thread Tomas Jelinek

Hi Mariusz,

You haven't mention pcs version you are running. Based on you mentioning 
running Pacemaker 2, I suppose you are running pcs 0.10.x. The text 
bellow applies to pcs 0.10.x.


Pcs doesn't depend on or use corosync.conf when connecting to other 
nodes. The reason is pcs must be able to connect to nodes not specified 
in corosync.conf, e.g. when there is no cluster created yet.


Instead, pcs has its own config file mapping node names to addresses. 
The easiest way to set it is to specify an address for each node in 'pcs 
host auth' command like this:

pcs host auth  addr=  addr= ...

Specifying addresses is not mandatory. If the addresses are omitted, pcs 
uses node names as addresses. See man pcs for more details.


To fix your issues, run 'pcs host auth' and specify all nodes and their 
addresses. Running the command on one node of your cluster should be enough.



Regards,
Tomas


Dne 10. 08. 20 v 14:21 Mariusz Gronczewski napsal(a):

Hi,

Pacemaker 2, the current setup is

* management network with host's hostname resolving to host's
   management IP
* cluster network for Pacemaker/Corosync communication
* corosync set up with node name and IP of the cluster network

pcs status shows both nodes online, added config syncs to the other
node etc. but pcs cluster status shows one node being offline.

After a look in firewall logs it appears all of the communication is
going just fine on the cluster network but PCS tries to talk with PCSD
on port 2224 via *management* network instead of using IP set as
ring0_addr in corosync

Is "just use host's hostname regardless of config" something normal ?
Is there a separate setting to pcs about which IP it should use ?

Regards



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] unable to start corosync

2020-07-02 Thread Tomas Jelinek

Hello,

It's really hard to tell without any further info.

Have you run any other pcs commands before running "cluster start"? Can 
you share your /etc/corosync/corosync.conf file and logs 
/var/log/pacemaker/pacemaker.log ?


Regards,
Tomas


Dne 01. 07. 20 v 13:47 Филипп Линецкий napsal(a):

Good Afternoon.
I've installed Corosync on my server, and after runnig the command "sudo 
pcs cluster start --all" i get the error message "unable to start 
corosync". I use Ubuntu server 16.04. How can i fix this error.

Thanks in advance.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] JQuery 1.2 < 3.5.0 Multiple XSS Vulnerability on PCS module

2020-07-02 Thread Tomas Jelinek

Hello,

See my response in the "jquery in pcs package" thread. Let's continue 
the discussion there.



Regards,
Tomas


Dne 01. 07. 20 v 15:35 S Sathish S napsal(a):

Hi Team,

We are getting below vulnerable alert while using pcs module , can we 
know mitigation plan or any corrective action required to fix this 
vulnerability .


Plugin ID             :  136929

Plugin Name      : JQuery 1.2 < 3.5.0 Multiple XSS

Port                     : TCP 2224

Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] jquery in pcs package

2020-07-02 Thread Tomas Jelinek

Hello,

Pcs team understand your concerns about jQuery bundled in pcs / pcsd.

Unfortunately, it is not possible to upgrade this jQuery library to a 
newer version right now. We are, however, very well aware this is a 
problem. We have been working on a new version of the web UI which will 
be compatible with the newest libraries and will allow upgrading them 
continuously.


In the meantime, we are monitoring security issues reported against 
jQuery. The fact there is a security issue in jQuery does not 
automatically mean pcsd web UI is affected. The issue may be related to 
a code which is not used by the pcsd web UI. In another instance, an 
issue has been reported to be found in a jQuery function which is 
actually not present in all the jQuery versions marked as affected by 
that issue.


As has been pointed out in this thread, security tools may base 
reporting issues only on libraries' version numbers.



Running pcsd web UI is optional. It is enabled by default but it can be 
easily and safely turned off. There is no negative impact on other pcs / 
pcsd functionalities. If you turn the web UI off, pcsd will still run 
and listen to network connections (port 2224 by default). However, all 
requests against the web UI will result in a simple page saying the web 
UI is not enabled and no jQuery library will be served.


To turn the web UI off, set:
PCSD_DISABLE_GUI=true
in /etc/sysconfig/pcsd (/etc/default/pcsd or other location depending on 
Linux distribution you are running) and restart pcsd service.


In newer versions of pcsd, you may also configure addresses pcsd binds 
to by setting PCSD_BIND_ADDR in the same file.


You are of course free to configure your firewall to block access to 
pcsd's port as you see fit, e.g. not allowing access from public networks.



Regards,
Tomas


Dne 02. 07. 20 v 12:40 Tony Stocker napsal(a):

On Wed, Jul 1, 2020 at 1:44 PM Tony Stocker  wrote:


So, first question: is this jquery something that is maintained,
promulgated by/with the Pacemaker installation? Or is this something
special that Red Hat is doing when they package it?


So, investigating the source code in GitHub, the inclusion of this
jquery is part of Pacemaker/pcs and related to the Web UI. So this
should be the proper forum to address it.


Second, if this is Pacemaker-maintained (not Red Hat) part of code, is
there a reason that it's such an old version, given that the current
version is 3.5.0, is used?


Based on the GitHub check-in date, it appears that this section of
code hasn't been updated in 7 years.


Finally, if this is Pacemaker-maintained (not Red Hat) part of code,
where can I find the documentation regarding the patching that's been
done to address the various cross-site scripting vulnerabilities? I'm
working under the assumption that the binary has been patched and the
vulnerabilities are no longer present, in which case I have to
document it with security. Obviously if the code has not been patched
and it's still vulnerable, that's a whole different issue.


So, one would assume since there haven't been any updates to the code
that this code is indeed vulnerable to all the XSS vulnerabilities,
which is not good. Regardless of anything else below, does anyone know
if there are any plans to update this part of the code to deal with
these security issues?

What appears to be worse is that this Web UI interface is not
optional, and runs on the communication port (default=2224) across all
interfaces on a system. So, even though I set up a cluster using host
names/addresses which are on a private lan, the security scanner tool
is still finding the Web UI running on port 2224 on the public IP
interface of the system. This can't be the correct/intended behavior,
can it? I'm thinking that this has to do with the setup step that I
see in pretty much all how-to documents that looks like this one from
the Red Hat 8 "Configuring and Maintaining High Availability Clusters"
document, section 4.7:

"If you are running the firewalld daemon, execute the following
commands to enable the ports that are required by the Red Hat High
Availability Add-On.
# firewall-cmd --permanent --add-service=high-availability
# firewall-cmd --add-service=high-availability"

Here is the description in the same document for Port 2224/tcp:
"Default pcsd port required on all nodes (needed by the pcsd Web UI
and required for node-to-node communication). You can configure the
pcsd port by means of the PCSD_PORT parameter in the
/etc/sysconfig/pcsd file.

It is crucial to open port 2224 in such a way that pcs from any node
can talk to all nodes in the cluster, including itself. When using the
Booth cluster ticket manager or a quorum device you must open port
2224 on all related hosts, such as Booth arbiters or the quorum device
host. "

Executing this command appears to add the 'high-availability'
"service" to all zones in firewalld, which I don't believe is needed,
or am I wrong? If you have nodes with multiple network 

Re: [ClusterLabs] custom cluster module

2020-06-16 Thread Tomas Jelinek

This error comes from pcs.

It means: Your resource's metadata define an option named 'binfile' and 
the resource cannot run when this option is not specified. (This is done 
by  in metadata.) Moreover, you 
tried to create the resource without specifying a value for the 
'binfile' option.


So it is looking for a command line option for the pcs resource create.

Can you share your resource's metadata and the pcs command which gives 
you the error?



Regards,
Tomas


Dne 12. 06. 20 v 18:19 jim napsal(a):
I tried to add the module again using ocf:vendor:app and now I get an 
error "Error: required resource option 'binfile' is missing, use --force 
to override".  i'm not sure if that error has to do with the metadata, i 
do have binfile defined, or if it is looking for a command line option 
for the pcs resource create.  i did try adding binfile= and 
params binfile=, but none of those options worked.



On Thursday, June 11, 2020, 1:50:43 PM EDT, Ken Gaillot 
 wrote:



On Thu, 2020-06-11 at 15:37 +, jim wrote:
 > I made a copy on the anything module and modified for the application
 > I want to monitor.  I put the script in
 > /usr/lib/ocf/resoure.d//. The script works from
 > the command line to start and stop the application.  when I try to
 > add the module with "pcs resource create 
 > ocf:heartbeat: --group  " i get an error,

You want ocf::


 > "Error: Agent 'ocf:heartbeat:' is not installed or does
 > not provide valid metadata: Metadata query for
 > ocf:heartbeat: failed: Input/Output error, use --force
 > to override".  the script supports meta-data|metadata|meta_data.  I'm
 > not sure how to know what is valid metadat. I know i saw a utility to
 > check the scripts, but that does not appear to be installed/available
 > on redhat.

--
Ken Gaillot mailto:kgail...@redhat.com>>

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Missing or Permissive Content-Security-Policy frameancestors HTTP Response Header in pcsd

2020-05-22 Thread Tomas Jelinek

Hi,

We added sending the Content-Security-Policy in commits:
https://github.com/ClusterLabs/pcs/commit/d76924fda6574cdcdac4fc75f433dd58ae48cb2e
and
https://github.com/ClusterLabs/pcs/commit/76aa72a67d2f89c3f725a6e9187631c270e5bb0c

Regards,
Tomas


Dne 19. 05. 20 v 10:06 Tomas Jelinek napsal(a):

Hi,

Even if you disable the pcsd GUI, the daemon is still running and 
listening on port 2224. It is needed for pcs to be able to communicate 
with and manage cluster nodes. The fact the page is accessible is expected.


What pcs version are you running?


Regards,
Tomas


Dne 18. 05. 20 v 9:25 S Sathish S napsal(a):

Hi Team,

We are getting below vulnerable alert while using pcs , we are not 
using pcs Web UI interface can we know mitigation plan for this.


Plugin ID     :  50344

Plugin Name  : Missing or Permissive Content-Security-Policy 
frameancestors HTTP Response Header


Port     : TCP 2224

We have tried disabled Web UI interface and restart pcsd service , 
Still page is accessible and login page display “PCSD GUI is disabled”


*Configuration File* :

# cat /etc/sysconfig/pcsd  | grep -i GUI

# Set DISABLE_GUI to true to disable GUI frontend in pcsd

PCSD_DISABLE_GUI=true

*Web UI Details* :

https://:2224/login <https://%3cIP%20Address%3e:2224/login>

Print “ PCSD GUI is disabled”

Can you suggest the way-forward for the same.

Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Missing or Permissive Content-Security-Policy frameancestors HTTP Response Header in pcsd

2020-05-19 Thread Tomas Jelinek

Hi,

Even if you disable the pcsd GUI, the daemon is still running and 
listening on port 2224. It is needed for pcs to be able to communicate 
with and manage cluster nodes. The fact the page is accessible is expected.


What pcs version are you running?


Regards,
Tomas


Dne 18. 05. 20 v 9:25 S Sathish S napsal(a):

Hi Team,

We are getting below vulnerable alert while using pcs , we are not using 
pcs Web UI interface can we know mitigation plan for this.


Plugin ID             :  50344

Plugin Name      : Missing or Permissive Content-Security-Policy 
frameancestors HTTP Response Header


Port                     : TCP 2224

We have tried disabled Web UI interface and restart pcsd service , Still 
page is accessible and login page display “PCSD GUI is disabled�


*Configuration File* :

# cat /etc/sysconfig/pcsd  | grep -i GUI

# Set DISABLE_GUI to true to disable GUI frontend in pcsd

PCSD_DISABLE_GUI=true

*Web UI Details* :

https://:2224/login 

Print “ PCSD GUI is disabled�

Can you suggest the way-forward for the same.

Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Off-line build-time cluster configuration

2020-04-24 Thread Tomas Jelinek

Preliminary version of this feature has been merged upstream:
https://github.com/ClusterLabs/pcs/commit/425edf7aaff3c0744d9ae32bceb96f4bbfd39ee1

At the moment, only RPM packages built by continuous integration system 
are available:

https://kronosnet.org/builds/pcs/
Note these are automatically built unofficial packages, they are not 
guaranteed to work flawlessly.


Regards,
Tomas


Dne 16. 04. 20 v 14:00 Tomas Jelinek napsal(a):

Hi Craig,

Currently, there is no support in RHEL8 for an equivalent of the --local 
option of the 'pcs cluster setup' command from RHEL7. We were focusing 
higher priority tasks related to supporting the new major version of 
corosync and knet. As a part of this, the 'pcs cluster setup' command 
has been completely overhauled providing better functionality overall, 
like improved validations, synchronizing other files than just 
corosync.conf and so on. Sadly, we didn't have enough capacity to 
support the --local option in step 1.


We are working on adding support for the --local option (or its 
equivalent) in the near future, but we don't have any code to share yet.



Obviously, the --local version of the setup will skip some tasks done in 
the regular cluster setup command. You are expected to do them by other 
means. I'll put them all here for the sake of completion, even though 
not all of them apply in your situation:

* check that nodes are not running or configured to run a cluster
* check that nodes do have cluster daemons installed in matching versions
* run 'pcs cluster destroy' on each node to get rid of all cluster 
config files and be sure there are no leftovers from previously 
configured clusters
* delete /var/lib/pcsd/pcs_settings.conf file (this is not done by the 
'pcs cluster destroy' command)

* distribute pcs auth tokens for the nodes
* distribute corosync and pacemaker authkeys, /etc/corosync/authkey and 
/etc/pacemaker/authkey respectively
* synchronize pcsd certificates (only needed if you intend to use pcs 
web UI in an HA mode)

* distribute corosync.conf
Let me know if you have any questions regarding these.


Running the current 'pcs cluster setup' command on all nodes is not 
really an option. The command requires the nodes to be online as it 
stores corosync.conf and other files to them over the network.


You may, however, run it once on a live cluster to get an idea of what 
the corosync.conf looks like and turn it into a template. I don't really 
expect its format or schema to be changed significantly during the RHEL8 
life cycle. I understand your concerns regarding this approach, but it 
would give you at least some option to proceed until the --local is 
supported in pcs.



Regards,
Tomas


Dne 14. 04. 20 v 20:46 Craig Johnston napsal(a):

Hello,

Sorry if this has already been covered, but a perusal of recent mail 
archives didn't turn up anything for me.


We are looking for help in configuring a pacemaker/corosync cluster at 
the time the Linux root file system is built, or perhaps as part of a 
"pre-pivot" process in the initramfs of a live-CD environment.


We are using the RHEL versions of the cluster products.  Current 
production is RHEL7 based, and we are trying to move to RHEL8.


The issues we have stem from the configuration tools' expectation that 
they are operating on a live system, with all cluster nodes available 
on the network.  This is obviously not the case during a "kickstart" 
install and configuration process.  It's also not true in an embedded 
environment where all nodes are powered simultaneously and expected to 
become operational without any human intervention.


We create the cluster configuration from a "system model", that 
describes the available nodes, cluster managed services, fencing 
agents, etc..  This model is different for each deployment, and is 
used as input to create a customized Linux distribution that is 
deployed to a set of physical hardware, virtual machines, or 
containers.  Each node, and it's root file system, is required to be 
configured and ready to go, the very first time it is ever booted.  
The on-media Linux file system is also immutable, and thus each boot 
is exactly like the previous one.


Under RHEL7, we were able to use the "pcs" command to create the 
corosync.conf/cib.xml files for each node.

e.g.
   pcs cluster setup --local --enable --force --name mycluster 
node1 node2 node3

   pcs -f ${CIB} property set startup-fencing=false
   pcs -f ${CIB} resource create tftp ocf:heartbeat:Xinetd 
  service=tftp  --group grp_tftp

   etc...

Plus a little "awk" "sed" on the corosync.conf file, and we were able 
to create a working configuration that worked out of the box. It's not 
pretty, but it works in spite of the fact that we feel like we're 
swimming up stream.


Under RHEL8 however, the "pcs cluster" command no longer has a 
"--local" option.  We can't find a

Re: [ClusterLabs] Off-line build-time cluster configuration

2020-04-16 Thread Tomas Jelinek

Hi Craig,

Currently, there is no support in RHEL8 for an equivalent of the --local 
option of the 'pcs cluster setup' command from RHEL7. We were focusing 
higher priority tasks related to supporting the new major version of 
corosync and knet. As a part of this, the 'pcs cluster setup' command 
has been completely overhauled providing better functionality overall, 
like improved validations, synchronizing other files than just 
corosync.conf and so on. Sadly, we didn't have enough capacity to 
support the --local option in step 1.


We are working on adding support for the --local option (or its 
equivalent) in the near future, but we don't have any code to share yet.



Obviously, the --local version of the setup will skip some tasks done in 
the regular cluster setup command. You are expected to do them by other 
means. I'll put them all here for the sake of completion, even though 
not all of them apply in your situation:

* check that nodes are not running or configured to run a cluster
* check that nodes do have cluster daemons installed in matching versions
* run 'pcs cluster destroy' on each node to get rid of all cluster 
config files and be sure there are no leftovers from previously 
configured clusters
* delete /var/lib/pcsd/pcs_settings.conf file (this is not done by the 
'pcs cluster destroy' command)

* distribute pcs auth tokens for the nodes
* distribute corosync and pacemaker authkeys, /etc/corosync/authkey and 
/etc/pacemaker/authkey respectively
* synchronize pcsd certificates (only needed if you intend to use pcs 
web UI in an HA mode)

* distribute corosync.conf
Let me know if you have any questions regarding these.


Running the current 'pcs cluster setup' command on all nodes is not 
really an option. The command requires the nodes to be online as it 
stores corosync.conf and other files to them over the network.


You may, however, run it once on a live cluster to get an idea of what 
the corosync.conf looks like and turn it into a template. I don't really 
expect its format or schema to be changed significantly during the RHEL8 
life cycle. I understand your concerns regarding this approach, but it 
would give you at least some option to proceed until the --local is 
supported in pcs.



Regards,
Tomas


Dne 14. 04. 20 v 20:46 Craig Johnston napsal(a):

Hello,

Sorry if this has already been covered, but a perusal of recent mail 
archives didn't turn up anything for me.


We are looking for help in configuring a pacemaker/corosync cluster at 
the time the Linux root file system is built, or perhaps as part of a 
"pre-pivot" process in the initramfs of a live-CD environment.


We are using the RHEL versions of the cluster products.  Current 
production is RHEL7 based, and we are trying to move to RHEL8.


The issues we have stem from the configuration tools' expectation that 
they are operating on a live system, with all cluster nodes available on 
the network.  This is obviously not the case during a "kickstart" 
install and configuration process.  It's also not true in an embedded 
environment where all nodes are powered simultaneously and expected to 
become operational without any human intervention.


We create the cluster configuration from a "system model", that 
describes the available nodes, cluster managed services, fencing agents, 
etc..  This model is different for each deployment, and is used as input 
to create a customized Linux distribution that is deployed to a set of 
physical hardware, virtual machines, or containers.  Each node, and it's 
root file system, is required to be configured and ready to go, the very 
first time it is ever booted.  The on-media Linux file system is also 
immutable, and thus each boot is exactly like the previous one.


Under RHEL7, we were able to use the "pcs" command to create the 
corosync.conf/cib.xml files for each node.

e.g.
   pcs cluster setup --local --enable --force --name mycluster 
node1 node2 node3

   pcs -f ${CIB} property set startup-fencing=false
   pcs -f ${CIB} resource create tftp ocf:heartbeat:Xinetd 
  service=tftp  --group grp_tftp

   etc...

Plus a little "awk" "sed" on the corosync.conf file, and we were able to 
create a working configuration that worked out of the box. It's not 
pretty, but it works in spite of the fact that we feel like we're 
swimming up stream.


Under RHEL8 however, the "pcs cluster" command no longer has a "--local" 
option.  We can't find any tool to replace it's functionality.  We can 
use "cibadmin --empty" to create a starting cib.xml file, but there is 
no way to add nodes to it (or create the corosync.conf file with nodes".


Granted, we could write our own tools to create template 
corosync.conf/cib.xml files, and "pcs -f" still works.  However, that 
leaves us in the unenviable position where the cluster configuration 
schema could change, and our tools would not be the wiser.  We'd much 
prefer to use a standard and maintained interface for 

[ClusterLabs] pcs 0.9.169 released

2020-04-14 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.9.169.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.169.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.169.zip

This version brings new commands for displaying resource dependencies
and safe-disabling resources (only disable a resource if no other
resources would be affected) backported from the pcs-0.10 branch. There
is also a few bugfixes.


Complete change log for this release:
## [0.9.169] - 2020-04-09

### Added
- `pcs resource relations` command shows relations between resources
  such as ordering constraints, ordering set constraints and relations
  defined by resource hierarchy ([rhbz#1770975])
- `pcs resource disable` can show effects of disabling resources and
  prevent disabling resources if any other resources would be affected
  ([rhbz#1770973])

### Fixed
- Do not generate custom DH key if it was not requested by setting its
  custom length ([rhbz#1760434])
- Documentation of `pcs constraint colocation add` ([rhbz#1500012])
- Empty constraint option are not allowed in `pcs constraint order` and
  `pcs constraint colocation add` commands ([rhbz#1500012])
- Fixed an issue where some pcs commands could not connect to cluster
  nodes over IPv6
- More fixes for the case when PATH environment variable is not set
  ([rhbz#1671174])
- Error messages in cases when cluster is not set up ([rhbz#1448569])
- Fix documentation and flags regarding bundled/cloned/grouped resources
  for `pcs (resource | stonith) (cleanup | refresh)` ([rhbz#1759269])

### Changed
- Pcsd no longer sends Server HTTP header ([rhbz#1765606])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Ondrej Mular and Tomas Jelinek.

Cheers,
Tomas


[rhbz#1448569]: https://bugzilla.redhat.com/show_bug.cgi?id=1448569
[rhbz#1500012]: https://bugzilla.redhat.com/show_bug.cgi?id=1500012
[rhbz#1671174]: https://bugzilla.redhat.com/show_bug.cgi?id=1671174
[rhbz#1759269]: https://bugzilla.redhat.com/show_bug.cgi?id=1759269
[rhbz#1760434]: https://bugzilla.redhat.com/show_bug.cgi?id=1760434
[rhbz#1765606]: https://bugzilla.redhat.com/show_bug.cgi?id=1765606
[rhbz#1770973]: https://bugzilla.redhat.com/show_bug.cgi?id=1770973
[rhbz#1770975]: https://bugzilla.redhat.com/show_bug.cgi?id=1770975

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Integrate external alerts with pcs cluster

2020-04-14 Thread Tomas Jelinek

Hi Ajay,

Dne 11. 04. 20 v 13:04 Ajay Srivastava napsal(a):


Hi,
In my environment I have a pacemaker cluster running which has various 
software services as resource agents.
These services have dependency on some hardware components. There is a 
service which monitors the hardware and sends alerts if anything is 
wrong with the hardware.


My plan is to add a resource agent for each hardware component with 
proper dependencies and fail this resource agent if I get a critical 
alert from hardware monitoring service. Please note that I do not want 
to use monitor functionality of resource agent as there is a service 
which is already doing same thing. I have two queries here -
1) Does the approach look good ? Is there a better way to implement it 
in pacemaker cluster ?


So you plan to fail the resource by calling crm_resource --fail from 
your monitoring service, right? How about the agent's monitor operation 
just checked the status / output of your existing monitoring service?


2) I can find --fail option in crm_resource but not in pcs cli. What 
would be the equivalent command in pcs as I am using pcs cli to 
configure the cluster ?


There is no equivalent for --fail in pcs. Just use crm_resource for that.

Regards,
Tomas



Regards,
Ajay


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] pcs cluster auth succeeds but pcs cluster setup failed with"error checking node availability: Unable to authenticate ..."

2020-03-09 Thread Tomas Jelinek

Hi Ken,

It looks like you are hitting this bug:
https://bugs.launchpad.net/ubuntu/+source/pcs/+bug/1640923

Try removing /etc/corosync/corosync.conf on both your nodes and run the 
commands again.


More info regarding the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1517333
https://github.com/ClusterLabs/pcs/issues/153

Regards,
Tomas


Dne 09. 03. 20 v 5:35 Ken napsal(a):

Hi Pacemaker users,

     I'm new to pacemaker, and recently I started from Pacemaker 2.0 doc 
-- *Clusters from Scratch (en-US)*, but got bogged down in cluster 
setting up step "pcs cluster auth ... and pcs cluster setup ..."
     While running pcs cluster auth, it says Authorized but the console 
debug output has errors.


root@mirror1:/home/m1# pcs cluster auth mirror1 mirror2 --debug
Running: /usr/bin/ruby -I/usr/share/pcsd/ /usr/share/pcsd/pcsd-cli.rb 
read_tokens

--Debug Input Start--
{}
--Debug Input End--
Return Value: 0
--Debug Output Start--
{
   "status": "ok",
   "data": {
   },
   "log": [
     "I, [2020-03-08T21:24:11.121997 #14049]  INFO -- : PCSD Debugging 
enabled\n",
     "D, [2020-03-08T21:24:11.122027 #14049] DEBUG -- : Did not detect 
RHEL 6\n",
     "I, [2020-03-08T21:24:11.122059 #14049]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name\n",
     "I, [2020-03-08T21:24:11.122073 #14049]  INFO -- : CIB USER: 
hacluster, groups: \n",
     "D, [2020-03-08T21:24:11.127385 #14049] DEBUG -- : 
[\"totem.cluster_name (str) = debian\\n\"]\n",

     "D, [2020-03-08T21:24:11.127432 #14049] DEBUG -- : []\n",
     "D, [2020-03-08T21:24:11.127451 #14049] DEBUG -- : Duration: 
0.005305199s\n",

     "I, [2020-03-08T21:24:11.127513 #14049]  INFO -- : Return Value: 0\n",
     "W, [2020-03-08T21:24:11.127655 #14049]  WARN -- : Cannot read 
config 'tokens' from '/var/lib/pcsd/tokens': No such file or directory @ 
rb_sysopen - /var/lib/pcsd/tokens\n",
     "E, [2020-03-08T21:24:11.127694 #14049] ERROR -- : Unable to parse 
tokens file: A JSON text must at least contain two octets!\n"

   ]
}
--Debug Output End--

Sending HTTP Request to: https://mirror1:2224/remote/check_auth
Data: None
Response Code: 401
Username: hacluster
Password:
Running: /usr/bin/ruby -I/usr/share/pcsd/ /usr/share/pcsd/pcsd-cli.rb auth
--Debug Input Start--
{"username": "hacluster", "local": false, "nodes": ["mirror1", 
"mirror2"], "password": "123456", "force": false}

--Debug Input End--
Return Value: 0
--Debug Output Start--
{
   "status": "ok",
   "data": {
     "auth_responses": {
   "mirror1": {
     "status": "ok",
     "token": "2873a435-339b-402f-854f-542379b480ec"
   },
   "mirror2": {
     "status": "ok",
     "token": "16f6346d-742c-454d-ac11-55137189baac"
   }
     },
     "sync_successful": true,
     "sync_nodes_err": [

     ],
     "sync_responses": {
     }
   },
   "log": [
     "I, [2020-03-08T21:24:16.598892 #14060]  INFO -- : PCSD Debugging 
enabled\n",
     "D, [2020-03-08T21:24:16.598922 #14060] DEBUG -- : Did not detect 
RHEL 6\n",
     "I, [2020-03-08T21:24:16.598956 #14060]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name\n",
     "I, [2020-03-08T21:24:16.598969 #14060]  INFO -- : CIB USER: 
hacluster, groups: \n",
     "D, [2020-03-08T21:24:16.604361 #14060] DEBUG -- : 
[\"totem.cluster_name (str) = debian\\n\"]\n",

     "D, [2020-03-08T21:24:16.604405 #14060] DEBUG -- : []\n",
     "D, [2020-03-08T21:24:16.604423 #14060] DEBUG -- : Duration: 
0.005386404s\n",

     "I, [2020-03-08T21:24:16.604459 #14060]  INFO -- : Return Value: 0\n",
     "W, [2020-03-08T21:24:16.605121 #14060]  WARN -- : Cannot read 
config 'tokens' from '/var/lib/pcsd/tokens': No such file or directory @ 
rb_sysopen - /var/lib/pcsd/tokens\n",
     "E, [2020-03-08T21:24:16.605196 #14060] ERROR -- : Unable to parse 
tokens file: A JSON text must at least contain two octets!\n",
     "I, [2020-03-08T21:24:16.605227 #14060]  INFO -- : SRWT Node: 
mirror1 Request: check_auth\n",
     "E, [2020-03-08T21:24:16.605238 #14060] ERROR -- : Unable to 
connect to node mirror1, no token available\n",
     "W, [2020-03-08T21:24:16.605361 #14060]  WARN -- : Cannot read 
config 'tokens' from '/var/lib/pcsd/tokens': No such file or directory @ 
rb_sysopen - /var/lib/pcsd/tokens\n",
     "E, [2020-03-08T21:24:16.605406 #14060] ERROR -- : Unable to parse 
tokens file: A JSON text must at least contain two octets!\n",
     "I, [2020-03-08T21:24:16.605419 #14060]  INFO -- : SRWT Node: 
mirror2 Request: check_auth\n",
     "E, [2020-03-08T21:24:16.605430 #14060] ERROR -- : Unable to 
connect to node mirror2, no token available\n",
     "I, [2020-03-08T21:24:16.690075 #14060]  INFO -- : Running: 
/usr/sbin/pcs status nodes corosync\n",
     "I, [2020-03-08T21:24:16.690132 #14060]  INFO -- : CIB USER: 
hacluster, groups: \n",
     "D, [2020-03-08T21:24:16.816941 #14060] DEBUG -- : [\"Corosync 
Nodes:\\n\", \" Online:\\n\", \" Offline:\\n\"]\n",

     "D, [2020-03-08T21:24:16.816995 #14060] DEBUG -- : []\n",
     

Re: [ClusterLabs] pacemaker certificate is not generated with SubjectAlternativeName

2020-03-02 Thread Tomas Jelinek

Dne 28. 02. 20 v 8:06 S Sathish S napsal(a):

Hi Team,

We have found that the Pacemaker certificate is not generated with 
SubjectAlternativeName.


You are right, SubjectAlternativeName is not specified. Minor correction 
though, it's pcsd certificate, not pacemaker.



Please find the general guidelines :

In case/client certificates are required, verification of the client 
identity SHOULD use the first matching subjectAltName field of the 
client certificate to be compared with an authorization identity present 
in a local or central AA database. /


Client certificates are not used with pcsd.

/To mitigate the Man-in-the-Middle risk, the server identity 
verification is RECOMMENDED to be done as well. A client can accept 
several server certificates in certificate validation issued by the same 
trusted CA./

//

/After certificate chain validation, the TLS client MUST check the 
identity of the server with a configured reference identity (e.g., a 
hostname). The clients MUST support checks using the subjectAltName 
field with type dNSName. If the certificate contains multiple 
subjectAltNamevalues then a match with any one of the fields is 
considered acceptable. /


I don't see anything here that would say subjectAltName is required to 
be present in certificates. Does the fact subjectAltName is not defined 
causing you any specific problems?


In any case, you are free and recommended to replace the default pcsd 
certificate with your own. You can use 'pcs pcsd certkey' and 'pcs pcsd 
sync-certificates' to do so.



Regards,
Tomas


Current Certificate details:

#keytool -printcert -file /var/lib/pcsd/pcsd.crt
Owner: CN=XXX, OU=pcsd, O=pcsd, L=Minneapolis, ST=MN, C=US
Issuer: CN=XXX, OU=pcsd, O=pcsd, L=Minneapolis, ST=MN, C=US
Serial number: 1703482bc5b
Valid from: Tue Feb 11 14:49:08 CET 2020 until: Fri Feb 08 14:49:08 CET 2030
Certificate fingerprints:
          MD5:  6E:C9:F8:E2:B9:F7:F6:65:53:B4:BD:B9:18:71:B9:78
          SHA1: 9E:7C:22:DA:61:AA:86:DB:D1:74:D4:AC:47:CA:DC:06:6A:21:C2:F0
          SHA256: 
1D:8D:88:55:70:FE:01:BB:DB:5C:BD:E7:FF:79:62:02:CB:64:97:A7:16:A4:29:49:F1:94:8E:2F:7B:FC:D4:B5

Signature algorithm name: SHA256withRSA
Subject Public Key Algorithm: 2048-bit RSA key
Version: 3

*Sample  certificate with SubjectedAltName details:*

#3: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
   DNSName: XXX
   DNSName: XXX]

Thanks and Regards,

S Sathish S


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PCS cluster auth fails

2020-02-07 Thread Tomas Jelinek

Hi,

I haven't seen such a traceback before. It looks like there are pcsd 
files missing 
(/usr/lib/pcsd/vendor/bundle/ruby/gems/rack-1.6.10/lib/rack/multipart/parser.rb). 
Is this the case? If so, can you try reinstalling pcs packages on your 
nodes?


Regards,
Tomas


Dne 07. 02. 20 v 7:39 Somanath Jeeva napsal(a):

Hi ,

I am using a two node corosync cluster(node1 and node2). When I do pcs 
cluster auth during corosync-qdevice configuration to the qdevice 
node(qnode), I am getting the below error,


/$ sudo pcs cluster auth qnode -u hacluster -p  --debug/

/Running: /usr/bin/ruby -I/usr/lib/pcsd/ /usr/lib/pcsd/pcsd-cli.rb auth/

/Environment:/

/  GEM_HOME=/usr/lib/pcsd/vendor/bundle/ruby/

/  HISTSIZE=1000/

/  HOME=/root/

/  HOSTNAME=node1/

/  LANG=en_US.UTF-8/

/  LC_ALL=C/

/  LOGNAME=root/

/  
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:/


/  MAIL=/var/spool/mail/admin/

/  PATH=/sbin:/bin:/usr/sbin:/usr/bin/

/  PCSD_DEBUG=true/

/  PCSD_NETWORK_TIMEOUT=60/

/  PS1=[\u@\h-$node_name \W]\$ /

/  SHELL=/sbin/nologin/

/  SUDO_COMMAND=/sbin/pcs cluster auth qnode -u hacluster -p *** 
--debug/


/  SUDO_GID=5007/

/  SUDO_UID=5008/

/  SUDO_USER=admin/

/  TERM=xterm/

/  USER=root/

/  USERNAME=root/

/--Debug Input Start--/

/{"username": "hacluster", "local": false, "nodes": {"qnode": null}, 
"password": "***", "force": false}/


/--Debug Input End--/

//

/Finished running: /usr/bin/ruby -I/usr/lib/pcsd/ 
/usr/lib/pcsd/pcsd-cli.rb auth/


/Return value: 0/

/--Debug Stdout Start--/

/{/

/  "status": "ok",/

/  "data": {/

/    "auth_responses": {/

/  "qnode": {/

/    "status": "ok",/

/    "token": "66c1020d-1089-4d8b-beab-a8f76e2c8b89"/

/  }/

/    },/

/    "sync_successful": true,/

/    "sync_nodes_err": [/

/  "node2"/

/    ],/

/    "sync_responses": {/

/  "node2": {/

/    "status": "error"/

/  },/

/  "node1": {/

/    "status": "ok",/

/    "result": {/

/  "tokens": "accepted"/

/    }/

/  }/

/    }/

/  },/

/  "log": [/

/    "I, [2020-02-07T17:11:34.011890 #30323]  INFO -- : PCSD Debugging 
enabled\n",/


/    "D, [2020-02-07T17:11:34.012401 #30323] DEBUG -- : Did not detect 
RHEL 6\n",/


/    "D, [2020-02-07T17:11:34.012446 #30323] DEBUG -- : Detected systemd 
is in use\n",/


/    "I, [2020-02-07T17:11:34.160540 #30323]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name\n",/


/    "I, [2020-02-07T17:11:34.160707 #30323]  INFO -- : CIB USER: 
hacluster, groups: \n",/


/    "D, [2020-02-07T17:11:34.176123 #30323] DEBUG -- : 
[\"totem.cluster_name (str) = HBASE\\n\"]\n",/


/    "D, [2020-02-07T17:11:34.176307 #30323] DEBUG -- : []\n",/

/    "D, [2020-02-07T17:11:34.176357 #30323] DEBUG -- : Duration: 
0.015370265s\n",/


/    "I, [2020-02-07T17:11:34.176443 #30323]  INFO -- : Return Value: 0\n",/

/    "I, [2020-02-07T17:11:34.407886 #30323]  INFO -- : Running: 
/usr/sbin/pcs status nodes corosync\n",/


/    "I, [2020-02-07T17:11:34.407989 #30323]  INFO -- : CIB USER: 
hacluster, groups: \n",/


/    "D, [2020-02-07T17:11:34.775751 #30323] DEBUG -- : [\"Corosync 
Nodes:\\n\", \" Online: node2 node1\\n\", \" Offline:\\n\"]\n",/


/    "D, [2020-02-07T17:11:34.775909 #30323] DEBUG -- : []\n",/

/    "D, [2020-02-07T17:11:34.775960 #30323] DEBUG -- : Duration: 
0.367740571s\n",/


/    "I, [2020-02-07T17:11:34.776056 #30323]  INFO -- : Return Value: 0\n",/

/    "I, [2020-02-07T17:11:34.776517 #30323]  INFO -- : Sending config 
'tokens' version 5 40712570bb7e5718afce943a151b259f04e7c080 to nodes: 

[ClusterLabs] pcs 0.10.4 released

2019-11-28 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.4.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.4.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.4.zip

This version brings new commands for displaying resource dependencies
and safe-disabling resources (only disable a resource if no other
resources would be affected). There is also a bunch of bugfixes.

Complete change log for this release:
## [0.10.4] - 2019-11-28

### Added
- New section in pcs man page summarizing changes in pcs-0.10. Commands
  removed or changed in pcs-0.10 print errors pointing to that section.
  ([rhbz#1728890])
- `pcs resource disable` can show effects of disabling resources and
  prevent disabling resources if any other resources would be affected
  ([rhbz#1631519])
- `pcs resource relations` command shows relations between resources
  such as ordering constraints, ordering set constraints and relations
  defined by resource hierarchy ([rhbz#1631514])

### Changed
- Expired location constraints are now hidden by default when listing
  constraints in any way. Using `--all` will list and denote them with
  `(expired)`. All expired rules are then marked the same way.
  ([rhbz#1442116])

### Fixed
- All node names and scores are validated when running `pcs constraint
  location avoids/prefers` before writing configuration to cib
  ([rhbz#1673835])
- Fixed crash when an invalid port is given in an address to the `pcs
  host auth` command ([rhbz#1698763])
- Command `pcs cluster verify` suggests `--full` option instead of `-V`
  option which is not recognized by pcs ([rhbz#1712347])
- It is now possible to authenticate remote clusters in web UI even if
  the local cluster is not authenticated ([rhbz#1743735])
- Documentation of `pcs constraint colocation add` ([rhbz#1734361])
- Empty constraint option are not allowed in `pcs constraint order` and
  `pcs constraint colocation add` commands ([rhbz#1734361])
- More fixes for the case when PATH environment variable is not set
- Fixed crashes and other issues when UTF-8 characters are present in
  the corosync.conf file ([rhbz#1741586])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Michal Pospisil, Miroslav Lisik, Ondrej Mular and
Tomas Jelinek.

Cheers,
Tomas


[rhbz#1442116]: https://bugzilla.redhat.com/show_bug.cgi?id=1442116
[rhbz#1631514]: https://bugzilla.redhat.com/show_bug.cgi?id=1631514
[rhbz#1631519]: https://bugzilla.redhat.com/show_bug.cgi?id=1631519
[rhbz#1673835]: https://bugzilla.redhat.com/show_bug.cgi?id=1673835
[rhbz#1698763]: https://bugzilla.redhat.com/show_bug.cgi?id=1698763
[rhbz#1712347]: https://bugzilla.redhat.com/show_bug.cgi?id=1712347
[rhbz#1728890]: https://bugzilla.redhat.com/show_bug.cgi?id=1728890
[rhbz#1734361]: https://bugzilla.redhat.com/show_bug.cgi?id=1734361
[rhbz#1741586]: https://bugzilla.redhat.com/show_bug.cgi?id=1741586
[rhbz#1743735]: https://bugzilla.redhat.com/show_bug.cgi?id=1743735

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] "pcs resource show --full" equivalent

2019-10-29 Thread Tomas Jelinek

Hi,

The command has been deprecated. It still works with the most recent pcs:

$ pcs resource show --full
Warning: This command is deprecated and will be removed. Please use 'pcs 
resource config' instead.

 Resource: d3 (class=ocf provider=pacemaker type=Dummy)
  Operations: migrate_from interval=0s timeout=20s 
(d3-migrate_from-interval-0s)

...

I guess you are using pcs 0.10.1 where the command has been removed. We 
revert the removal in pcs 0.10.2.


Anyway, the preferred way is to use 'pcs resource config'.

The reason for deprecating the show command is that the command is doing 
completely different things based on used switches which may be confusing:

- if --full is used, configuration is displayed
- if --full is not used, status is displayed


Regards,
Tomas

Dne 25. 10. 19 v 18:34 Marcelo Terres napsal(a):

Hello.

Just installed a Pacemaker 2.x and noticed the command "pcs resource 
show --full" is not available anymore.


Is there any equivalent command to get all data related to resources?

Something like this:

root@srv1 ~ $ pcs resource show --full
  Resource: virtual_ip (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=XXX.XXX.XXX.XXX cidr_netmask=25
   Operations: start interval=0s timeout=20s (virtual_ip-start-interval-0s)
               stop interval=0s timeout=20s (virtual_ip-stop-interval-0s)
               monitor interval=30s (virtual_ip-monitor-interval-30s)

Thanks.


--
*Kind regards,*

*To contact us: **accou...@nta.co.uk * or 
call us on *Tel: *01708 320 000 or *100* from your/ VoIP/ handset


Privacy Policy  | Terms and 
Conditions 



NTA logo

White Label your handsets with NTA. Contact us today to find out how you 
can have your own branded handset.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: The effect of using "default" attribute in RA metadata

2019-09-05 Thread Tomas Jelinek

Dne 03. 09. 19 v 11:27 Ulrich Windl napsal(a):

Hi!

Reading the RA API metadata specification, there is a "default" attribute for 
"parameter".
I wonder what the effect of specifying a default is: Is it purely documentation 
(and the RA has to take care it uses the same default value as in the 
metadata), or will the configuration tools actually use that value if the user 
did not specify a parameter value?


Pcs doesn't use the default values. If you don't specify a value for an 
option, pcs simply doesn't put that option into the CIB leaving it to 
the RA to figure out a default value. This has a benefit of always 
following the default even if it changes. There is no plan to change the 
behavior.


Copying default values to the CIB has at least two disadvantages:
1) If the default in a RA ever changes, the change would have no effect 
- a value in the CIB would still be set to the previous default. To 
configure it to follow the defaults, one would have to remove the option 
value afterwards or a new option to pcs commands to control the behavior 
would have to be added.
2) When a value is the same as its default it would be unclear if the 
intention is to follow the default or the user set a value which is the 
same as the default by coincidence.



Regards,
Tomas



I would favor the second, of course ;-)

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Command to show location constraints?

2019-08-28 Thread Tomas Jelinek

Dne 28. 08. 19 v 6:07 Andrei Borzenkov napsal(a):

27.08.2019 18:24, Casey & Gina пишет:

Hi, I'm looking for a way to show just location constraints, if they exist, for a 
cluster.  I'm looking for the same data shown in the output of `pcs config` under the 
"Location Constraints:" header, but without all the rest, so that I can write a 
script that checks if there are any set.

The situation is that sometimes people will perform a failover with `pcs resource move 
--master `, but then forget to follow it up with `pcs resource clear 
`, and then it causes unnecessary failbacks later.  As we never want to 
have any particular node in the cluster preferred for this resource, I'd like to write a 
script that can automatically check for any location constraints being set and either alert 
or clear it automatically.

Thank you for any advice!



crm configure show type:location

to see all location constraints


pcs constraint location
to see all location constraints

pcs constraint location show
allows you to filter by resources or nodes


crm configure show type:location and cli-*

to see constraints that are create by crm_resource --move or --ban.


There is no alternative to this in pcs. You can add --full to the pcs 
commands above to make them show constraint ids and then look for ids 
starting with cli-.



Regards,
Tomas


But you can also simply run "crm_resource --clear" periodically, this
will remove all move/ban constraints on all resources.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] pcs 0.10.3 released

2019-08-26 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.3.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.3.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.3.zip

This is a bugfix release resolving issues introduced in pcs-0.10.2.

Complete change log for this release:
## [0.10.3] - 2019-08-23

### Fixed
- Fixed crashes in the `pcs host auth` command ([rhbz#1676957])
- Fixed id conflict with current bundle configuration in `pcs resource
  bundle reset` ([rhbz#1657166])
- Options starting with - and -- are no longer ignored for non-root
  users (broken since pcs-0.10.2) ([rhbz#1725183])
- Fixed crashes when pcs is configured that no rubygems are bundled in
  pcs package ([ghissue#208])
- Standby nodes running resources are listed separately in `pcs status
  nodes`
- Parsing arguments in the `pcs constraint order` and `pcs constraint
  colocation add` commands has been improved, errors which were
  previously silent are now reported ([rhbz#1734361])
- Fixed shebang correction in Makefile ([ghissue#206])
- Generate 256 bytes long corosync authkey, longer keys are not
  supported when FIPS is enabled ([rhbz#1740218])

### Changed
- Command `pcs resource bundle reset` no longer accepts the container
  type ([rhbz#1657166])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Michal Pospisil, Ondrej Mular and Tomas Jelinek.

Cheers,
Tomas


[ghissue#206]: https://github.com/ClusterLabs/pcs/issues/206
[ghissue#208]: https://github.com/ClusterLabs/pcs/issues/208
[rhbz#1657166]: https://bugzilla.redhat.com/show_bug.cgi?id=1657166
[rhbz#1676957]: https://bugzilla.redhat.com/show_bug.cgi?id=1676957
[rhbz#1725183]: https://bugzilla.redhat.com/show_bug.cgi?id=1725183
[rhbz#1734361]: https://bugzilla.redhat.com/show_bug.cgi?id=1734361
[rhbz#1740218]: https://bugzilla.redhat.com/show_bug.cgi?id=1740218
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.9.168 released

2019-08-13 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.9.168.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.168.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.168.zip


Complete change log for this release:
## [0.9.168] - 2019-08-02

### Added
- It is now possible to disable pcsd SSL certificate being synced across
  the cluster during creating new cluster and adding a node to an
  existing cluster by setting `PCSD_SSL_CERT_SYNC_ENABLED` to `false` in
  pcsd config file ([rhbz#1665898])
- Length of DH key for SSL key exchange can be set in pcsd config file
- `pcs status` now shows failed and pending fencing actions and `pcs
  status --full` shows the whole fencing history. Pacemaker supporting
  fencing history is required. ([rhbz#1466088])
- `pcs stonith history` commands for displaying, synchronizing and
  cleaning up fencing history. Pacemaker supporting fencing history is
  required. ([rhbz#1595444])
- Support for clearing expired moves and bans of resources
  ([rhbz#1673829])
- HSTS is now enabled in pcsd ([rhbz#1558063])

### Fixed
- Pcs works even when PATH environment variable is not set
  ([rhbz#1671174])
- Fixed several "Unknown report" error messages
- Improved validation of qdevice heuristics exec options
  ([rhbz#1551663])
- Fixed crashes in the `pcs cluster auth` command ([rhbz#1676956])
- Pcs does not crash due to unhandled LibraryError exceptions
  ([rhbz#1710750])
- `pcs config restore` does not fail with "Invalid cross-device link"
  error any more ([rhbz#1712315])
- Fixed id conflict with current bundle configuration in `pcs resource
  bundle reset` ([rhbz#1725849])
- Standby nodes running resources are listed separately in `pcs status
  nodes` ([rhbz#1619253])
- Parsing arguments in the `pcs constraint order` and `pcs constraint
  colocation add` commands has been improved, errors which were
  previously silent are now reported ([rhbz#1500012])

### Changed
Command `pcs resource bundle reset` no longer accepts the container type
([rhbz#1598197])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Ondrej Mular and Tomas Jelinek

Cheers,
Tomas


[rhbz#1466088]: https://bugzilla.redhat.com/show_bug.cgi?id=1466088
[rhbz#1500012]: https://bugzilla.redhat.com/show_bug.cgi?id=1500012
[rhbz#1551663]: https://bugzilla.redhat.com/show_bug.cgi?id=1551663
[rhbz#1558063]: https://bugzilla.redhat.com/show_bug.cgi?id=1558063
[rhbz#1595444]: https://bugzilla.redhat.com/show_bug.cgi?id=1595444
[rhbz#1598197]: https://bugzilla.redhat.com/show_bug.cgi?id=1598197
[rhbz#1619253]: https://bugzilla.redhat.com/show_bug.cgi?id=1619253
[rhbz#1665898]: https://bugzilla.redhat.com/show_bug.cgi?id=1665898
[rhbz#1671174]: https://bugzilla.redhat.com/show_bug.cgi?id=1671174
[rhbz#1673829]: https://bugzilla.redhat.com/show_bug.cgi?id=1673829
[rhbz#1676956]: https://bugzilla.redhat.com/show_bug.cgi?id=1676956
[rhbz#1710750]: https://bugzilla.redhat.com/show_bug.cgi?id=1710750
[rhbz#1712315]: https://bugzilla.redhat.com/show_bug.cgi?id=1712315
[rhbz#1725849]: https://bugzilla.redhat.com/show_bug.cgi?id=1725849
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Adding HAProxy as a Resource

2019-07-25 Thread Tomas Jelinek

Hi,

It looks like your pacemaker binaries do not support systemd resources. 
If that is the case then there is nothing pcs can do about that. 
Pacemaker experts should be able to shed some light on this.


Regards,
Tomas


Dne 25. 07. 19 v 9:23 Somanath Jeeva napsal(a):

Hi

Systemd unit file is available for haproxy but the pcs resource standard 
command does not list systemd standard .

Also I am not using the pacemaker packages from redhat. I am using the packages 
downloaded from clusterlabs.




With Regards
Somanath Thilak J

-Original Message-
From: Tomas Jelinek 
Sent: Monday, July 15, 2019 5:58 PM
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] Adding HAProxy as a Resource

Hi,

Do you have a systemd unit file for haproxy installed?
Does 'crm_resource --list-standards' print 'systemd'?
Does 'crm_resource --list-agents systemd' print 'haproxy'?
Note that when you use full agent name (that is including : ) it is case 
sensitive in pcs.

Regards,
Tomas


Dne 11. 07. 19 v 10:14 Somanath Jeeva napsal(a):

Hi

I am using the resource agents built from clusterlabs and when I add the 
systemd resource I am getting the below error .

$ sudo pcs resource create HAPROXY systemd:haproxy op monitor
interval=2s
Error: Agent 'systemd:haproxy' is not installed or does not provide
valid metadata: Metadata query for systemd:haproxy failed: -22, use
--force to override



With Regards
Somanath Thilak J

-Original Message-
From: Kristoffer Grönlund 
Sent: Thursday, July 11, 2019 1:22 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed 
Cc: Somanath Jeeva 
Subject: Re: [ClusterLabs] Adding HAProxy as a Resource

On 2019-07-11 09:31, Somanath Jeeva wrote:

Hi All,

I am using HAProxy in my environment  which I plan to add to
pacemaker as resource. I see no RA available for that in resource agent.

Should I write a new RA or is there any way to add it to pacemaker as
a systemd service.


Hello,

haproxy works well as a plain systemd service, so you can add it as 
systemd:haproxy - that is, instead of an ocf: prefix, just put systemd:.

If you want the cluster to manage multiple, differently configured instances of 
haproxy, you might have to either create custom systemd service scripts for 
each one, or create an agent with parameters.

Cheers,
Kristoffer





With Regards
Somanath Thilak J


___
Manage your subscription:
https://protect2.fireeye.com/url?k=28466b53-74926310-28462bc8-86a1150
b
c3ba-bb674f3a9b557cbd=1=https%3A%2F%2Flists.clusterlabs.org%2Fmai
l
man%2Flistinfo%2Fusers

ClusterLabs home:
https://protect2.fireeye.com/url?k=4c5edd73-108ad530-4c5e9de8-86a1150
b c3ba-5da4e39ebe912cdf=1=https%3A%2F%2Fwww.clusterlabs.org%2F


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Adding HAProxy as a Resource

2019-07-15 Thread Tomas Jelinek

Hi,

Do you have a systemd unit file for haproxy installed?
Does 'crm_resource --list-standards' print 'systemd'?
Does 'crm_resource --list-agents systemd' print 'haproxy'?
Note that when you use full agent name (that is including : ) it is case 
sensitive in pcs.


Regards,
Tomas


Dne 11. 07. 19 v 10:14 Somanath Jeeva napsal(a):

Hi

I am using the resource agents built from clusterlabs and when I add the 
systemd resource I am getting the below error .

$ sudo pcs resource create HAPROXY systemd:haproxy op monitor interval=2s
Error: Agent 'systemd:haproxy' is not installed or does not provide valid 
metadata: Metadata query for systemd:haproxy failed: -22, use --force to 
override



With Regards
Somanath Thilak J

-Original Message-
From: Kristoffer Grönlund 
Sent: Thursday, July 11, 2019 1:22 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Cc: Somanath Jeeva 
Subject: Re: [ClusterLabs] Adding HAProxy as a Resource

On 2019-07-11 09:31, Somanath Jeeva wrote:

Hi All,

I am using HAProxy in my environment  which I plan to add to pacemaker
as resource. I see no RA available for that in resource agent.

Should I write a new RA or is there any way to add it to pacemaker as
a systemd service.


Hello,

haproxy works well as a plain systemd service, so you can add it as 
systemd:haproxy - that is, instead of an ocf: prefix, just put systemd:.

If you want the cluster to manage multiple, differently configured instances of 
haproxy, you might have to either create custom systemd service scripts for 
each one, or create an agent with parameters.

Cheers,
Kristoffer





With Regards
Somanath Thilak J


___
Manage your subscription:
https://protect2.fireeye.com/url?k=28466b53-74926310-28462bc8-86a1150b
c3ba-bb674f3a9b557cbd=1=https%3A%2F%2Flists.clusterlabs.org%2Fmail
man%2Flistinfo%2Fusers

ClusterLabs home:
https://protect2.fireeye.com/url?k=4c5edd73-108ad530-4c5e9de8-86a1150b
c3ba-5da4e39ebe912cdf=1=https%3A%2F%2Fwww.clusterlabs.org%2F


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] [EXTERNAL] What's the best practice to scale-out/increase the cluster size?

2019-07-15 Thread Tomas Jelinek

Dne 11. 07. 19 v 16:01 Michael Powell napsal(a):

Thanks again for the feedback.  As a novice to Pacemaker, I am learning a great 
deal and have a great deal more to learn.

I'm afraid I was not precise in my choice of the term "stand-alone".  As you point out, 
our situation is really case b) "as the 2-node cluster but only one is up at the moment". 
 That said, there are cases where we would want to bring up a single node at a time.  These occur 
during maintenance periods, though, not during normal operation.  Hence we believe we can live with 
the STONITH/reboot.

I was not aware that corosync.conf could be reloaded while Corosync was 
running.  I'll review the Corosync documentation again.



Be aware that not all corosync settings can be changed while corosync is 
running. See corosync wiki[1] for details.


[1]: https://github.com/corosync/corosync/wiki/Config-file-values


Regards,
Tomas



Regards,
   Michael

-Original Message-
From: Roger Zhou 
Sent: Thursday, July 11, 2019 1:16 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; Michael Powell ; Ken Gaillot 

Cc: Venkata Reddy Chappavarapu 
Subject: [EXTERNAL] What's the best practice to scale-out/increase the cluster size? 
(was: [ClusterLabs] "node is unclean" leads to gratuitous reboot)


On 7/11/19 2:15 AM, Michael Powell wrote:

Thanks to you and Andrei for your responses.  In our particular situation, we 
want to be able to operate with either node in stand-alone mode, or with both 
nodes protected by HA.  I did not mention this, but I am working on upgrading 
our product from a version which used Pacemaker version 1.0.13 and Heartbeat to 
run under CentOS 7.6 (later 8.0).  The older version did not exhibit this 
behavior, hence my concern.

I do understand the "wait_for_all" option better, and now that I know why the 
"gratuitous" reboot is happening, I'm more comfortable with that behavior.  I think the 
biggest operational risk would occur following a power-up of the chassis.  If one node were 
significantly delayed during bootup, e.g. because of networking issues, the other node would issue 
the STONITH and reboot the delayed node.  That would be an annoyance, but it would be relatively 
infrequent.  Our customers almost always keep at least one node (and usually both nodes) 
operational 24/7.



2 cents,

I think your requirement is very clear. Well, I view this is a tricky design 
challenge. There are two different situations likely fool people:

a) the situation of being stand-alone (one node, really not a cluster)
b) the situation as the 2-node cluster but only one is up at the moment

Being not define the concepts clearly and not clarify their difference, people 
could mix them together and set wrong expectation but on a different concept, 
really.

In your case, the configuration is 2-node cluster. The log indicates the 
correct behavior for b), eg. those STONITH actions are by-design indeed.
But, people set the wrong expectation to treat it as a).

With that, could be a cleaner design to let it be a stand-alone system first, 
then smoothly grow it to two nodes?

Furthermore, this trigger me to raise a question, mostly for corosync:

What's the best practice to scale-out/increase the cluster size?

I can think one of the approach is to modify corosync.conf and reload it in 
run-time. Well, it doesn't look like as smart as the reverse way, namely, 
allow_downscale/auto_tie_breaker/last_man_standing of the advanced corosync 
feature set, see `man votequorum`.


Cheers,
Roger





Regards,
Michael

-Original Message-
From: Ken Gaillot 
Sent: Tuesday, July 09, 2019 12:42 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed 
Cc: Michael Powell ; Venkata Reddy
Chappavarapu 
Subject: [EXTERNAL] Re: [ClusterLabs] "node is unclean" leads to
gratuitous reboot

On Tue, 2019-07-09 at 12:54 +, Michael Powell wrote:

I have a two-node cluster with a problem.  If I start


Not so much a problem as a configuration choice :)

There are trade-offs in any case.

- wait_for_all in corosync.conf: If set, this will make each starting node wait 
until it sees the other before gaining quorum for the first time. The downside 
is that both nodes must be up for the cluster to start; the upside is a clean 
starting point and no fencing.

- startup-fencing in pacemaker properties: If disabled, either node
can start without fencing the other. This is unsafe; if the other node
is actually active and running resources, but unreachable from the
newly up node, the newly up node may start the same resources, causing
split- brain. (Easier than you might think: consider taking a node
down for hardware maintenance, bringing it back up without a network,
then plugging it back into the network -- by that point it may have
brought up resources and starts causing havoc.)

- Start corosync on both nodes, then start pacemaker. This avoids start-up 
fencing since when pacemaker starts on either 

Re: [ClusterLabs] "node is unclean" leads to gratuitous reboot

2019-07-15 Thread Tomas Jelinek

Dne 09. 07. 19 v 14:54 Michael Powell napsal(a):

...


Here’s the configuration, from the first node –

[root@mgraid-16201289RN00023-0 bin]# pcs config

Cluster Name:

Corosync Nodes:

mgraid-16201289RN00023-0 mgraid-16201289RN00023-1

Pacemaker Nodes:

mgraid-16201289RN00023-0 mgraid-16201289RN00023-1

Resources:

Master: ms-SS16201289RN00023

   Meta Attrs: clone-max=2 notify=true globally-unique=false 
target-role=Started


   Resource: SS16201289RN00023 (class=ocf provider=omneon type=ss)

    Attributes: ss_resource=SS16201289RN00023 
ssconf=/var/omneon/config/config.16201289RN00023


    Operations: monitor interval=3s role=Master timeout=7s 
(SS16201289RN00023-monitor-3s)


    monitor interval=10s role=Slave timeout=7 
(SS16201289RN00023-monitor-10s)


    stop interval=0 timeout=20 (SS16201289RN00023-stop-0)

    start interval=0 timeout=300 (SS16201289RN00023-start-0)

Clone: mgraid-stonith-clone

   Resource: mgraid-stonith (class=stonith type=mgpstonith)

    Operations: monitor interval=0 timeout=20s 
(mgraid-stonith-monitor-interval-0)


Stonith Devices:

Fencing Levels:

Location Constraints:

   Resource: ms-SS16201289RN00023

     Constraint: ms-SS16201289RN00023-master-w1

   Rule: role=master score=100  (id:ms-SS16201289RN00023-master-w1-rule)

     Expression: #uname eq mgraid-16201289rn00023-0  
(id:ms-SS16201289RN00023-master-w1-rule-expression)


Ordering Constraints:

Colocation Constraints:

Ticket Constraints:

Alerts:

No alerts defined

Resources Defaults:

failure-timeout: 1min

Operations Defaults:

No defaults set

Cluster Properties:

cluster-infrastructure: corosync

cluster-recheck-interval: 1min

dc-deadtime: 5s

dc-version: 1.1.19-8.el7-c3c624ea3d

have-watchdog: false

last-lrm-refresh: 1562513532

stonith-enabled: true

Quorum:

   Options:

     wait_for_all: 0



Interestingly, as you’ll note below, the “/two_node/” option is also set 
to 1, but is not reported as such above.


Pcs prints only options configurable by the 'pcs quorum update' command 
there: auto_tie_breaker, last_man_standing, last_man_standing_window and 
wait_for_all. Quorum device settings are printed there as well, but you 
do not have any set.


We plan to make pcs print more corosync settings / options in the 
pcs-0.10 branch.



Regards,
Tomas



Finally, here’s *//etc/corosync/corosync.conf/* –

totem {

     version: 2

     crypto_cipher: none

     crypto_hash: none

     interface {

     ringnumber: 0

     bindnetaddr: 169.254.1.1

     mcastaddr: 239.255.1.1

     mcastport: 5405

     ttl: 1

     }

}

logging {

     fileline: off

     to_stderr: no

     to_logfile: yes

     logfile: /var/log/cluster/corosync.log

     to_syslog: yes

     debug: on

     timestamp: on

     logger_subsys {

     subsys: QUORUM

     debug: on

     }

}

nodelist {

     node {

     ring0_addr: mgraid-16201289RN00023-0

     nodeid: 1

     }

     node {

     ring0_addr: mgraid-16201289RN00023-1

     nodeid: 2

     }

}

quorum {

     provider: corosync_votequorum

     two_node: 1

     wait_for_all: 0

}

I’d appreciate any insight you can offer into this behavior, and any 
suggestions you may have.


Regards,

   Michael

cid:image004.gif@01D15F05.08FA4670

 Michael Powell

     Sr. Staff Engineer

     15220 NW Greenbrier Pkwy

     Suite 290

     Beaverton, OR   97006

     T 503-372-7327    M 503-789-3019   H 503-625-5332


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PCSD - High Memory Usage

2019-06-27 Thread Tomas Jelinek

Hi,

We (pcs developers) do not have any statistics regarding pcsd memory 
consumption on a cluster with that many nodes and resources. So I cannot 
confirm if this is normal or not. That being said, it seems to me the 
memory usage is higher than it should be.


Over the time, there has been reports of pcsd consuming unreasonably 
high amounts of memory. These issues are hard to reproduce and they do 
not happen every time so we haven't been able to track them down and 
fix. The problem may not even be in pcsd code itself, it may be buried 
in rubygems, libraries or ruby. (There was a bug in ruby threads causing 
pcsd using 100% CPU.)


We are working on overhauling pcsd to a new architecture while moving 
its codebase from Ruby to Python. The architecture gives us more control 
over pcsd worker threads which should help us lower pcsd memory 
footprint. It is a long term goal, however, definitely not something 
which would be ready in a few months.


For now the only advice I have is to restart pcsd from time to time when 
you think its memory footprint is too high. Or, if you don't use web UI, 
you can stop pcsd completely. The cluster does not depend on pcsd 
running, it is only needed for managing the cluster.


I understand this is far from a solution but at this time I cannot offer 
anything else.



Tomas


Dne 21. 06. 19 v 13:32 Daniel Brant napsal(a):

Hi,

I'm running a cluster with 9 active nodes (4 in standby), on this 
cluster are 258 registered resources. 99% of these are perl and java 
process ran via multi-target systemd units, 2 are heartbeat IPaddr2 and 
1 fence device. I am seeing very high memory usage in the pcsd process, 
2.5GB to 2.7GB on most node with one running 3.3GB.


Is this kind of memory usage to be expected when managing this volume of 
resources? I am running pacemaker version 1.1.19 on CentOS7.6.1810.


Any suggestions or advice is greatly appreciated.

Thanks,

Danny


This email is from the Press Association. For more information, see 
www.pressassociation.com. This email may contain confidential 
information. Only the addressee is permitted to read, copy, distribute 
or otherwise use this email or any attachments. If you have received it 
in error, please contact the sender immediately. Any opinion expressed 
in this email is personal to the sender and may not reflect the opinion 
of the Press Association. Any email reply to this address may be subject 
to interception or monitoring for operational reasons or for lawful 
business practices.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.2 released

2019-06-13 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.2.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.2.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.2.zip

This release brings support for adding, changing and removing corosync
links in an existing cluster and several corosync / knet related fixes.
There is also a new command for reconfiguring bundle resources.

Complete change log for this release:
## [0.10.2] - 2019-06-12

### Added
- Command `pcs config checkpoint diff` for displaying differences
  between two specified checkpoints ([rhbz#1655055])
- Support for resource instance attributes uniqueness check according to
  resource agent metadata ([rhbz#1665404])
- Command `pcs resource bundle reset` for a bundle configuration
  reseting ([rhbz#1657166])
- `pcs cluster setup` now checks if nodes' addresses match value of
  `ip_version` ([rhbz#1667053])
- Support for sbd option SBD\_TIMEOUT\_ACTION ([rhbz#1664828])
- Support for clearing expired moves and bans of resources
  ([rhbz#1625386])
- Commands for adding, changing and removing corosync links
  ([rhbz#1667058])

### Fixed
- Corosync config file parser updated and made more strict to match
  changes in corosync
- Allow non-root users to read quorum status (commands `pcs status
  corosync`, `pcs status quorum`, `pcs quorum device status`, `pcs
  quorum status`) ([rhbz#1653316])
- Removed command `pcs resource show` dropped from usage and man page
  ([rhbz#1656953])
- Put proper link options' names to corosync.conf ([rhbz#1659051])
- Fixed issuses in configuring links in the 'create cluster' form in web
  UI ([rhbz#1664057])
- Pcs no longer removes empty `meta_attributes`, `instance_attributes`
  and other nvsets and similar elements from CIB. Such behavior was
  causing problems when pacemaker ACLs were in effect, leading to
  inability of pushing modified CIBs to pacemaker. ([rhbz#1659144])
- `ipv4-6` and `ipv6-4` are now valid values of `ip_version` in cluster
  setup ([rhbz#1667040])
- Crash when using unsupported options in commands `pcs status` and `pcs
  config` ([rhbz#1668422])
- `pcs resource group add` now fails gracefully instead of dumping an
  invalid CIB when a group ID is already occupied by a non-resource
  element ([rhbz#1668223])
- pcs no longer spawns unnecessary processes for reading known hosts
  ([rhbz#1676945])
- Lower load caused by periodical config files syncing in pcsd by making
  it sync less frequently ([rhbz#1676957])
- Improve logging of periodical config files syncing in pcsd
- Knet link option `ip_version` has been removed, it was never supported
  by corosync. Transport option `ip_version` is still in place.
  ([rhbz#1674005])
- Several bugs in linklist validation in `pcs cluster setup`
  ([rhbz#1667090])
- Fixed a typo in documentation (regardles -> regardless)
  ([rhbz#1660702])
- Fixed pcsd crashes when non-ASCII characters are present in systemd
  journal
- Pcs works even when PATH environment variable is not set
  ([rhbz#1673825])
- Fixed several "Unknown report" error messages
- Pcsd SSL certificates are no longer synced across cluster nodes when
  creating new cluster or adding new node to an existing cluster. To
  enable the syncing, set `PCSD_SSL_CERT_SYNC_ENABLED` to `true` in pcsd
  config. ([rhbz#1673822])
- Pcs now reports missing node names in corosync.conf instead of failing
  silently
- Fixed an issue where some pcs commands could not connect to cluster
  nodes over IPv6
- Fixed cluster setup problem in web UI when full domain names are used
  ([rhbz#1687965])
- Fixed inability to setup cluster in web UI when knet links are not
  specified ([rhbz#1687562])
- `--force` works correctly in `pcs quorum unblock` (broken since
  pcs-0.10.1)
- Removed `3des` from allowed knet crypto ciphers since it is actually
  not supported by corosync
- Improved validation of corosync options and their values
  ([rhbz#1679196], [rhbz#1679197])

### Changed
- Do not check whether watchdog is defined as an absolute path when
  enabling SBD. This check is not needed anymore as we are validating
  watchdog against list provided by SBD itself.

### Deprecated
- Command `pcs resource show`, removed in pcs-0.10.1, has been readded
  as deprecated to ease transition to its replacements. It will be
  removed again in future. [rhbz#1661059]


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Ondrej Mular, Tomas Jelinek and Valentin Vidic.

Cheers,
Tomas


[rhbz#1625386]: https://bugzilla.redhat.com/show_bug.cgi?id=1625386
[rhbz#1653316]: https://bugzilla.redhat.com/show_bug.cgi?id=1653316
[rhbz#1655055]: https://bugzilla.redhat.com/show_bug.cgi?id=1655055
[rhbz#1656953]: https://bugzilla.redhat.com/show_bug.cgi?id=1656953
[rhbz#1657166]: https://bugzilla.redhat.com/show_bug.cgi?id=1657166
[rhbz#1659051]: https://bugzilla.redhat.com/show_bug.cgi?id=1659051
[rhbz#1659144]: https://bugzilla.redhat.com/show_bug.cgi?id=

Re: [ClusterLabs] Antw: Re: Question on permissions for pcsd ghost files

2019-04-23 Thread Tomas Jelinek

Dne 23. 04. 19 v 13:26 Ulrich Windl napsal(a):

Tomas Jelinek  schrieb am 23.04.2019 um 12:36 in

Nachricht
:

The files are listed as ghost files in order to let rpm know they belong
to pcs but are not distributed in rpm packages. Those files are created
by pcsd in runtime. I guess the 000 permissions come from the fact those
files are not present in rpm packages.


My guess it's just bad packing: I have an RPM myself that introduces a %ghost,
and it has permissions:
%ghost %config(missingok) %verify(not md5 mtime size) %attr(0644,root,root)
/etc/%{name}.conf


We'll fix that in the next pcs build, then.

Thanks!
Tomas



Regards,
Ulrich



The real permissions you have look OK to me as long as /var/lib/pcsd has
700. Files pcsd.cookiesecret, pcsd.crt and pcsd.key should not be
executable but it does not matter that much. We fixed it pcs‑0.9.165.
The fix doesn't change permissions of existing files, though.


Regards,
Tomas


Dne 19. 04. 19 v 21:20 Hayden,Robert napsal(a):

Working through an audit and need to determine what the expected
permissions are for the following files.

[root@techval13]# rpm ‑V pcs

.M...  c /var/lib/pcsd/pcs_settings.conf

.M...  c /var/lib/pcsd/pcs_users.conf

.M...  c /var/lib/pcsd/pcsd.cookiesecret

.M...  c /var/lib/pcsd/pcsd.crt

.M...  c /var/lib/pcsd/pcsd.key

.M...  c /var/lib/pcsd/tokens

Looking at the RPM spec, these appear to be ghost files with permissions
set to 000 in the spec.

[root@techval13]# rpm ‑q ‑‑dump pcs | grep /var/lib/pcsd/pcs_settings.conf

/var/lib/pcsd/pcs_settings.conf 0 1541089158
 010
root root 1 0 0 X

Currently, the permissions after a normal installation are listed in the
“first” column from my custom report output.  The second column is the
“expected” permissions from the RPM spec.

644 | 000 | /var/lib/pcsd/pcs_settings.conf |
pcs‑0.9.165‑6.0.1.el7.x86_64

644 | 000 | /var/lib/pcsd/pcs_users.conf | pcs‑0.9.165‑6.0.1.el7.x86_64

700 | 000 | /var/lib/pcsd/pcsd.cookiesecret |
pcs‑0.9.165‑6.0.1.el7.x86_64

700 | 000 | /var/lib/pcsd/pcsd.crt | pcs‑0.9.165‑6.0.1.el7.x86_64

700 | 000 | /var/lib/pcsd/pcsd.key | pcs‑0.9.165‑6.0.1.el7.x86_64

600 | 000 | /var/lib/pcsd/tokens | pcs‑0.9.165‑6.0.1.el7.x86_64

Any help or guidance would be greatly appreciated.


Thanks

Robert

CONFIDENTIALITY NOTICE This message and any included attachments are
from Cerner Corporation and are intended only for the addressee. The
information contained in this message is confidential and may constitute
inside or non‑public information under international, federal, or state
securities laws. Unauthorized forwarding, printing, copying,
distribution, or use of such information is strictly prohibited and may
be unlawful. If you are not the addressee, please promptly delete this
message and notify the sender of the delivery error by e‑mail or you may
call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1)
(816)221‑1024.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

  1   2   >