Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-04 Thread Ken Gaillot
On 11/04/2016 02:29 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
> Nachricht
> <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
>> ClusterLabs is happy to announce the first release candidate for
>> Pacemaker version 1.1.16. Source code is available at:
>>
>> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 
>>
>> The most significant enhancements in this release are:
>>
>> * rsc-pattern may now be used instead of rsc in location constraints, to
>> allow a single location constraint to apply to all resources whose names
>> match a regular expression. Sed-like %0 - %9 backreferences let
>> submatches be used in node attribute names in rules.
>>
>> * The new ocf:pacemaker:attribute resource agent sets a node attribute
>> according to whether the resource is running or stopped. This may be
>> useful in combination with attribute-based rules to model dependencies
>> that simple constraints can't handle.
> 
> I don't quite understand this: Isn't the state of a resource in the CIB status
> section anyway? If not, why not add it? So it would be readily available for
> anyone (rules, constraints, etc.).

This (hopefully) lets you model more complicated relationships.

For example, someone recently asked whether they could make an ordering
constraint apply only at "start-up" -- the first time resource A starts,
it does some initialization that B needs, but once that's done, B can be
independent of A.

For that case, you could group A with an ocf:pacemaker:attribute
resource. The important part is that the attribute is not set if A has
never run on a node. So, you can make a rule that B can run only where
the attribute is set, regardless of the value -- even if A is later
stopped, the attribute will still be set.

Another possible use would be for a cron that needs to know whether a
particular resource is running, and an attribute query is quicker and
easier than something like parsing crm_mon output or probing the service.

It's all theoretical at this point, and I'm not entirely sure those
examples would be useful :) but I wanted to make the agent available for
people to experiment with.

>> * Pacemaker's existing "node health" feature allows resources to move
>> off nodes that become unhealthy. Now, when using
>> node-health-strategy=progressive, a new cluster property
>> node-health-base will be used as the initial health score of newly
>> joined nodes (defaulting to 0, which is the previous behavior). This
>> allows cloned and multistate resource instances to start on a node even
>> if it has some "yellow" health attributes.
> 
> So the node health is more or less a "node score"? I don't understand the last
> sentence. Maybe give an example?

Yes, node health is a score that's added when deciding where to place a
resource. It does get complicated ...

Node health monitoring is optional, and off by default.

Node health attributes are set to red, yellow or green (outside
pacemaker itself -- either by a resource agent, or some external
process). As an example, let's say we have three node health attributes
for CPU usage, CPU temperature, and SMART error count.

With a progressive strategy, red and yellow are assigned some negative
score, and green is 0. In our example, let's say yellow gets a -10 score.

If any of our attributes are yellow, resources will avoid the node
(unless they have higher positive scores from something like stickiness
or a location constraint).

Normally, this is what you want, but if your resources are cloned on all
nodes, maybe you don't care if some attributes are yellow. In that case,
you can set node-health-base=20, so even if two attributes are yellow,
it won't prevent resources from running (20 + -10 + -10 = 0).

There is nothing in node-health-base that is specific to clones; that's
just the most likely use case.

>> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not
>> properly passed to multistate resources with notification enabled. This
>> has been fixed. To help resource agents detect when the fix is
>> available, the CRM feature set has been incremented. (Whenever the
>> feature set changes, mixed-version clusters are supported only during
>> rolling upgrades -- nodes with an older version will not be allowed to
>> rejoin once they shut down.)
> 
> Where can I find a description of current "CRM feature sets"?
>
> Ulrich
> 
>>
>> * Watchdog-based fencing using sbd now works on remote nodes.
>>
>> * The build process now takes advantage of various compiler features
>> (RELRO, PIE, as-needed linking, etc.) that enhance security and start-up
>> performance. See the "Hardening flags" comments in the configure.ac file
>> for more details.
>>
>> * Python 3 compatibility: The Pacemaker project now targets
>> compatibility with both python 2 (versions 2.6 and later) and python 3
>> (versions 3.2 and later). All of the project's python code now meets
>> this target, with the exception of CTS, which is still

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-04 Thread Ken Gaillot
On 11/04/2016 02:53 AM, Jan Pokorný wrote:
> On 04/11/16 08:29 +0100, Ulrich Windl wrote:
>> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
>> Nachricht <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
>>> ClusterLabs is happy to announce the first release candidate for
>>> Pacemaker version 1.1.16. Source code is available at:
>>>
>>> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 
>>>
>>> The most significant enhancements in this release are:
>>>
>>> [...]
>>>
>>> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not
>>> properly passed to multistate resources with notification enabled. This
>>> has been fixed. To help resource agents detect when the fix is
>>> available, the CRM feature set has been incremented. (Whenever the
>>> feature set changes, mixed-version clusters are supported only during
>>> rolling upgrades -- nodes with an older version will not be allowed to
>>> rejoin once they shut down.)
>>
>> Where can I find a description of current "CRM feature sets"?
> 
> It's originally internal-only versioning for cluster to know which
> node has the oldest software and hence is predestined to be DC
> (in rolling update scenario).
> 
> Ken recently mapped these versions (together with LRMD protocol
> versions relevant in the context of pacemaker remote communication)
> to proper release versions: http://clusterlabs.org/wiki/ReleaseCalendar

There is also a description of how the cluster uses the CRM feature set
in the new upgrade documentation:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_rolling_node_by_node



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] DRBD demote/promote not called - Why? How to fix?

2016-11-04 Thread CART Andreas
Hi

I have a basic 2 node active/passive cluster with Pacemaker (1.1.14 , pcs: 
0.9.148) / CMAN (3.0.12.1) / Corosync (1.4.7) on RHEL 6.8.
This cluster runs NFS on top of DRBD (8.4.4).

Basically the system is working on both nodes and I can switch the resources 
from one node to the other.
But switching resources to the other node does not work, if I try to move just 
one resource and have the others follow due to the location constraints.

>From the logged messages I see that in this "failure case" there is NO attempt 
>to demote/promote the DRBD clone resource.

Here is my setup:
==
Cluster Name: clst1
Corosync Nodes:
 ventsi-clst1-sync ventsi-clst2-sync
Pacemaker Nodes:
 ventsi-clst1-sync ventsi-clst2-sync

Resources:
 Resource: IPaddrNFS (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=xxx.xxx.xxx.xxx cidr_netmask=24
  Operations: start interval=0s timeout=20s (IPaddrNFS-start-interval-0s)
  stop interval=0s timeout=20s (IPaddrNFS-stop-interval-0s)
  monitor interval=5s (IPaddrNFS-monitor-interval-5s)
 Resource: NFSServer (class=ocf provider=heartbeat type=nfsserver)
  Attributes: nfs_shared_infodir=/var/lib/nfsserversettings/ 
nfs_ip=xxx.xxx.xxx.xxx nfsd_args="-H xxx.xxx.xxx.xxx"
  Operations: start interval=0s timeout=40 (NFSServer-start-interval-0s)
  stop interval=0s timeout=20s (NFSServer-stop-interval-0s)
  monitor interval=10s timeout=20s (NFSServer-monitor-interval-10s)
 Master: DRBDClone
  Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
notify=true
  Resource: DRBD (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=nfsdata
   Operations: start interval=0s timeout=240 (DRBD-start-interval-0s)
   promote interval=0s timeout=90 (DRBD-promote-interval-0s)
   demote interval=0s timeout=90 (DRBD-demote-interval-0s)
   stop interval=0s timeout=100 (DRBD-stop-interval-0s)
   monitor interval=1s timeout=5 (DRBD-monitor-interval-1s)
 Resource: DRBD_global_clst (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd1 directory=/drbdmnts/global_clst fstype=ext4
  Operations: start interval=0s timeout=60 (DRBD_global_clst-start-interval-0s)
  stop interval=0s timeout=60 (DRBD_global_clst-stop-interval-0s)
  monitor interval=20 timeout=40 
(DRBD_global_clst-monitor-interval-20)

Stonith Devices:
 Resource: ipmi-fence-clst1 (class=stonith type=fence_ipmilan)
  Attributes: lanplus=1 login=foo passwd=bar action=reboot 
ipaddr=yyy.yyy.yyy.yyy pcmk_host_check=static-list 
pcmk_host_list=ventsi-clst1-sync auth=password timeout=30 cipher=1
  Operations: monitor interval=60s (ipmi-fence-clst1-monitor-interval-60s)
 Resource: ipmi-fence-clst2 (class=stonith type=fence_ipmilan)
  Attributes: lanplus=1 login=foo passwd=bar action=reboot 
ipaddr=zzz.zzz.zzz.zzz pcmk_host_check=static-list 
pcmk_host_list=ventsi-clst2-sync auth=password timeout=30 cipher=1
  Operations: monitor interval=60s (ipmi-fence-clst2-monitor-interval-60s)
Fencing Levels:

Location Constraints:
  Resource: ipmi-fence-clst1
Disabled on: ventsi-clst1-sync (score:-INFINITY) 
(id:location-ipmi-fence-clst1-ventsi-clst1-sync--INFINITY)
  Resource: ipmi-fence-clst2
Disabled on: ventsi-clst2-sync (score:-INFINITY) 
(id:location-ipmi-fence-clst2-ventsi-clst2-sync--INFINITY)
Ordering Constraints:
  start IPaddrNFS then start NFSServer (kind:Mandatory) 
(id:order-IPaddrNFS-NFSServer-mandatory)
  promote DRBDClone then start DRBD_global_clst (kind:Mandatory) 
(id:order-DRBDClone-DRBD_global_clst-mandatory)
  start DRBD_global_clst then start IPaddrNFS (kind:Mandatory) 
(id:order-DRBD_global_clst-IPaddrNFS-mandatory)
Colocation Constraints:
  NFSServer with IPaddrNFS (score:INFINITY) 
(id:colocation-NFSServer-IPaddrNFS-INFINITY)
  DRBD_global_clst with DRBDClone (score:INFINITY) 
(id:colocation-DRBD_global_clst-DRBDClone-INFINITY)
  IPaddrNFS with DRBD_global_clst (score:INFINITY) 
(id:colocation-IPaddrNFS-DRBD_global_clst-INFINITY)

Resources Defaults:
 resource-stickiness: INFINITY
Operations Defaults:
 timeout: 10s

Cluster Properties:
 cluster-infrastructure: cman
 dc-version: 1.1.14-8.el6-70404b0
 have-watchdog: false
 last-lrm-refresh: 1478277432
 no-quorum-policy: ignore
 stonith-enabled: true
 symmetric-cluster: true
==

Initial state is e.g. this (all resources at node1):

Online: [ ventsi-clst1-sync ventsi-clst2-sync ]

Full list of resources:

 ipmi-fence-clst1   (stonith:fence_ipmilan):Started 
ventsi-clst2-sync
 ipmi-fence-clst2   (stonith:fence_ipmilan):Started 
ventsi-clst1-sync
 IPaddrNFS  (ocf::heartbeat:IPaddr2):   Started ventsi-clst1-sync
 NFSServer  (ocf::heartbeat:nfsserver): Started ventsi-clst1-sync
 Master/Slave Set: DRBDClone [DRBD]
 Masters: [ ventsi-clst1-sync ]

Re: [ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-04 Thread Jan Pokorný
On 04/11/16 08:29 +0100, Ulrich Windl wrote:
> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
> Nachricht <8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
>> ClusterLabs is happy to announce the first release candidate for
>> Pacemaker version 1.1.16. Source code is available at:
>> 
>> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 
>> 
>> The most significant enhancements in this release are:
>> 
>> [...]
>> 
>> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not
>> properly passed to multistate resources with notification enabled. This
>> has been fixed. To help resource agents detect when the fix is
>> available, the CRM feature set has been incremented. (Whenever the
>> feature set changes, mixed-version clusters are supported only during
>> rolling upgrades -- nodes with an older version will not be allowed to
>> rejoin once they shut down.)
> 
> Where can I find a description of current "CRM feature sets"?

It's originally internal-only versioning for cluster to know which
node has the oldest software and hence is predestined to be DC
(in rolling update scenario).

Ken recently mapped these versions (together with LRMD protocol
versions relevant in the context of pacemaker remote communication)
to proper release versions: http://clusterlabs.org/wiki/ReleaseCalendar

-- 
Jan (Poki)


pgpzDT7kPIwPr.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-04 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 03.11.2016 um 17:08 in
Nachricht
<8af2ff98-05fd-a2c7-f670-58d0ff68e...@redhat.com>:
> ClusterLabs is happy to announce the first release candidate for
> Pacemaker version 1.1.16. Source code is available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.16-rc1 
> 
> The most significant enhancements in this release are:
> 
> * rsc-pattern may now be used instead of rsc in location constraints, to
> allow a single location constraint to apply to all resources whose names
> match a regular expression. Sed-like %0 - %9 backreferences let
> submatches be used in node attribute names in rules.
> 
> * The new ocf:pacemaker:attribute resource agent sets a node attribute
> according to whether the resource is running or stopped. This may be
> useful in combination with attribute-based rules to model dependencies
> that simple constraints can't handle.

I don't quite understand this: Isn't the state of a resource in the CIB status
section anyway? If not, why not add it? So it would be readily available for
anyone (rules, constraints, etc.).


> 
> * Pacemaker's existing "node health" feature allows resources to move
> off nodes that become unhealthy. Now, when using
> node-health-strategy=progressive, a new cluster property
> node-health-base will be used as the initial health score of newly
> joined nodes (defaulting to 0, which is the previous behavior). This
> allows cloned and multistate resource instances to start on a node even
> if it has some "yellow" health attributes.

So the node health is more or less a "node score"? I don't understand the last
sentence. Maybe give an example?


> 
> * Previously, the OCF_RESKEY_CRM_meta_notify_active_* variables were not
> properly passed to multistate resources with notification enabled. This
> has been fixed. To help resource agents detect when the fix is
> available, the CRM feature set has been incremented. (Whenever the
> feature set changes, mixed-version clusters are supported only during
> rolling upgrades -- nodes with an older version will not be allowed to
> rejoin once they shut down.)

Where can I find a description of current "CRM feature sets"?

Ulrich

> 
> * Watchdog-based fencing using sbd now works on remote nodes.
> 
> * The build process now takes advantage of various compiler features
> (RELRO, PIE, as-needed linking, etc.) that enhance security and start-up
> performance. See the "Hardening flags" comments in the configure.ac file
> for more details.
> 
> * Python 3 compatibility: The Pacemaker project now targets
> compatibility with both python 2 (versions 2.6 and later) and python 3
> (versions 3.2 and later). All of the project's python code now meets
> this target, with the exception of CTS, which is still python 2 only.
> 
> * The Pacemaker coding guidelines have been replaced by a more
> comprehensive addition to the documentation set, "Pacemaker
> Development". It is intended for developers working on the Pacemaker
> code base itself, rather than external code such as resource agents. A
> copy is viewable at
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Dev

> elopment/index.html
> 
> As usual, the release includes many bugfixes, including a fix for a
> serious security vulnerability (CVE-2016-7035). For a more detailed list
> of changes, see the change log:
> 
> https://github.com/ClusterLabs/pacemaker/blob/1.1/ChangeLog 
> 
> Everyone is encouraged to download, compile and test the new release. We
> do many regression tests and simulations, but we can't cover all
> possible use cases, so your feedback is important and appreciated. Due
> to the security fix, I intend to keep this release cycle short, so quick
> testing feedback is especially appreciated.
> 
> Many thanks to all contributors of source code to this release,
> including Andrew Beekhof, Bin Liu, Christian Schneider, Christoph Berg,
> David Shane Holden, Ferenc Wágner, Yan Gao, Hideo Yamauchi, Jan Pokorný,
> Ken Gaillot, Klaus Wenninger, Kostiantyn Ponomarenko, Kristoffer
> Grönlund, Lars Ellenberg, Masatake Yamato, Michal Koutný, Nakahira
> Kazutomo, Nate Clark, Nishanth Aravamudan, Oyvind Albrigtsen, Ruben
> Kerkhof, Tim Bishop, Vladislav Bogdanov and Yusuke Iida. Apologies if I
> have overlooked anyone.
> -- 
> Ken Gaillot 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org