[ClusterLabs] Potential deprecation: Node-attribute-based rules in operation meta-attributes

2024-04-02 Thread Ken Gaillot
Hi all,

I have recently been cleaning up Pacemaker's rule code, and came across
an inconsistency.

Currently, meta-attributes may have rules with date/time-based
expressions (the  element). Node attribute
expressions (the  element) are not allowed, with the
exception of operation meta-attributes (beneath an  or
 element).

I'd like to deprecate support for node attribute expressions for
operation meta-attributes in Pacemaker 2.1.8, and drop support in
3.0.0.

I don't think it makes sense to vary meta-attributes by node. For
example, if a clone monitor has on-fail="block" (to cease all actions
on instances everywhere) on one node and on-fail="stop" (to stop all
instances everywhere) on another node, what should the cluster do if
monitors fail on both nodes? It seems to me that it's more likely to be
confusing than helpful.

If anyone has a valid use case for node attribute expressions for
operation meta-attributes, now is the time to speak up!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Potential deprecation: Disabling schema validation for the CIB

2024-04-02 Thread Ken Gaillot
Hi all,

Pacemaker uses an XML schema to prevent invalid syntax from being added
to the CIB. The CIB's "validate-with" option is typically set to a
version of this schema (like "pacemaker-3.9").

It is possible to explicitly disable schema validation by setting
validate-with to "none". This is clearly a bad idea since it allows
invalid syntax to be added, which will at best be ignored and at worst
cause undesired or buggy behavior.

I'm thinking of deprecating the ability to use "none" in Pacemaker
2.1.8 and dropping support in 3.0.0. If anyone has a valid use case for
this feature, now is the time to speak up!
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PostgreSQL server timelines offset after promote

2024-04-02 Thread FLORAC Thierry
Hi Jehan-Guillaume,

Thanks for your links, but I already had a look at them when using Pacemaker 
for the first time (except for the last one).
Actually, I forgot to mention that PostgreSQL and pacemaker are run on a Debian 
GNU/Linux system (latest "Bookworm" release); on reboot, Pacemaker is stopped 
using "systemctl stop" , and resources *seems* to be migrated correctly to 
promoted slave.
When the previous master is restarted, Pacemaker is actually restarted 
automatically and instantly promoted as new master; it *seems* that it's at 
this moment that a new timeline is sometimes created on backup server...

Don't you think that I should disable automatic restarting of Pacemaker on the 
master server, and handle promotion manually when a switch occurs?

Best regards,
Thierry

--
Thierry Florac

Resp. Pôle Architecture Applicative et Mobile
DSI - Dépt. Études et Solutions Tranverses
2 bis avenue du Général Leclerc - CS 30042
94704 MAISONS-ALFORT Cedex
Tél : 01 40 19 59 64 - 06 26 53 42 09
www.onf.fr

[https://www.ext.onf.fr/img/onf-signature.jpg]


De : Jehan-Guillaume de Rorthais 
Envoyé : mercredi 27 mars 2024 09:04
À : FLORAC Thierry 
Cc : Cluster Labs - All topics related to open-source clustering welcomed 

Objet : Re: [ClusterLabs] PostgreSQL server timelines offset after promote

Bonjour Thierry,

On Mon, 25 Mar 2024 10:55:06 +
FLORAC Thierry  wrote:

> I'm trying to create a PostgreSQL master/slave cluster using streaming
> replication and pgsqlms agent. Cluster is OK but my problem is this : the
> master node is sometimes restarted for system operations, and the slave is
> then promoted without any problem ;

When you have to do some planed system operation, you **must** diligently
ask permission to pacemaker. Pacemaker is the real owner of your resource. It
will react to any unexpected event, even if it's a planed one. You must
consider it as a hidden colleague taking care of your resource.

There's various way to deal with Pacemaker when you need to do some system
maintenance on your primary, it depends on your constraints. Here are two
examples:

* ask Pacemaker to move the "promoted" role to another node
* then put the node in standby mode
* then do your admin tasks
* then unstandby your node: a standby should start on the original node
* optional: move back your "promoted" role to original node

Or:

* put the whole cluster in maintenance mode
* then do your admin tasks
* then check everything works as the cluster expect
* then exit the maintenance mode

The second one might be tricky if Pacemaker find some unexpected status/event
when exiting the maintenance mode.

You can find (old) example of administrative tasks in:

* with pcs: https://clusterlabs.github.io/PAF/CentOS-7-admin-cookbook.html
* with crm: https://clusterlabs.github.io/PAF/Debian-8-admin-cookbook.html
* with "low level" commands:
  https://clusterlabs.github.io/PAF/administration.html

These docs updates are long overdue, sorry about that :(

Also, here is an hidden gist (that needs some updates as well):

https://github.com/ClusterLabs/PAF/tree/workshop/docs/workshop/fr


> after reboot, the old master is re-promoted, but I often get an error in
> slave logs :
>
>   FATAL:  la plus grande timeline 1 du serveur principal est derrière la
>   timeline de restauration 2
>
> which can be translated in english to :
>
>   FATAL: the highest timeline 1 of main server is behind restoration timeline
>  2


This is unexpected. I wonder how Pacemaker is being stopped. It is supposed to
stop gracefully its resource. The promotion scores should be updated to reflect
the local resource is not a primary anymore and PostgreSQL should be demoted
then stopped. It is supposed to start as a standby after a graceful shutdown.

Regards,
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/