[ClusterLabs] sbd v1.5.2

2023-01-09 Thread Klaus Wenninger
Hi sbd - developers & users!

Thanks to everybody for contributing to tests and
further development.

Only functional change is the first topic in the list below.
And even that is 'just' refusing startup in a case where
the config anyway wouldn't have led to a successful cluster
startup.

Improved logs/build/test should make things more convenient
and less error prone.


Changes since 1.5.1

- fail startup if pacemaker integration is disabled while
  SBD_SYNC_RESOURCE_STARTUP is conflicting (+ hint to overcome)
- improve logs
  - when logging state of SBD_PACEMAKER tell it is just that as
this might still be overridden via cmdline options
  - log a warning if SBD_PACEMAKER is overridden by -P or -PP option
  - do not warn about startup syncing with pacemaker integration disabled
  - when watchdog-device is busy give a hint on who is hogging it
- improve build environment
  - have --with-runstatedir overrule --runstatedir
  - use new package name for pacemaker devel on opensuse
  - make config location configurable for man-page-creation
  - reverse alloc/de-alloc order to make gcc-12 static analysis happy
- improve test environment
  - have image-files in /dev/shm to assure they are in memory and
sbd opening the files with O_SYNC doesn't trigger unnecessary
syncs on a heavily loaded test-machine
fallback to /tmp if /dev/shm doesn't exist
  - wrapping away libaio and usage of device-mapper for block-device
simulation can now be passed into make via
SBD_USE_DM & SBD_TRANSLATE_AIO
  - have variables that configure test-environment be printed
out prior to running tests
  - finally assure we clean environment when interrupted by a
signal (bash should have done it with just setting EXIT handler -
but avoiding bashism might come handy one day)

Regards,
Klaus
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Resource validation [was: multiple resources - pgsqlms - and IP(s)]

2023-01-09 Thread Jehan-Guillaume de Rorthais via Users
Hi,

I definitely have some work/improvements to do on the pgsqlms agent, but
there's still some details I'm interested to discuss below.

On Fri, 6 Jan 2023 16:36:19 -0800
Reid Wahl  wrote:

> On Fri, Jan 6, 2023 at 3:26 PM Jehan-Guillaume de Rorthais via Users
>  wrote:
>>
>> On Wed, 4 Jan 2023 11:15:06 +0100
>> Tomas Jelinek  wrote:
>>  
>>> Dne 04. 01. 23 v 8:29 Reid Wahl napsal(a):  
 On Tue, Jan 3, 2023 at 10:53 PM lejeczek via Users
  wrote:  
>
> On 03/01/2023 21:44, Ken Gaillot wrote:  
>> On Tue, 2023-01-03 at 18:18 +0100, lejeczek via Users wrote:  
[...]
>>> Not related - Is this an old bug?:
>>>  
>>> -> $ pcs resource create pgsqld-apps ocf:heartbeat:pgsqlms  
>>> bindir=/usr/bin pgdata=/apps/pgsql/data op start timeout=60s
>>> op stop timeout=60s op promote timeout=30s op demote
>>> timeout=120s op monitor interval=15s timeout=10s
>>> role="Master" op monitor interval=16s timeout=10s
>>> role="Slave" op notify timeout=60s meta promotable=true
>>> notify=true master-max=1 --disable
>>> Error: Validation result from agent (use --force to override):
>>>  ocf-exit-reason:You must set meta parameter notify=true
>>> for your master resource
>>> Error: Errors have occurred, therefore pcs is unable to continue  
>> pcs now runs an agent's validate-all action before creating a
>> resource. In this case it's detecting a real issue in your command.
>> The options you have after "meta" are clone options, not meta options
>> of the resource being cloned. If you just change "meta" to "clone" it
>> should work.  
> Nope. Exact same error message.
> If I remember correctly there was a bug specifically
> pertained to 'notify=true'  

 The only recent one I can remember was a core dump.
 - Bug 2039675 - pacemaker coredump with ocf:heartbeat:mysql resource
 (https://bugzilla.redhat.com/show_bug.cgi?id=2039675)

  From a quick inspection of the pcs resource validation code
 (lib/pacemaker/live.py:validate_resource_instance_attributes_via_pcmk()),
 it doesn't look like it passes the meta attributes. It only passes the
 instance attributes. (I could be mistaken.)

 The pgsqlms resource agent checks the notify meta attribute's value as
 part of the validate-all action. If pcs doesn't pass the meta
 attributes to crm_resource, then the check will fail.
>>>
>>> Pcs cannot pass meta attributes to crm_resource, because there is
>>> nowhere to pass them to.  
>>
>> But, they are passed as environment variable by Pacemaker, why pcs couldn't
>> set them as well when running the agent?  
> 
> pcs uses crm_resource to run the validate-all action. crm_resource
> doesn't provide a way to pass in meta attributes -- only instance
> attributes. Whether crm_resource should provide that is another
> question...

But crm_resource can set them as environment variable, they are inherited to
the resource agent when executing it:

  # This fails
  # crm_resource --validate   \
 --class ocf --agent pgsqlms --provider heartbeat \
 --option pgdata=/var/lib/pgsql/15/data   \
 --option bindir=/usr/pgsql-15/bin
  Operation validate (ocf:heartbeat:pgsqlms) returned 5 (not installed: 
You must set meta parameter notify=true for your "master" resource)
  ocf-exit-reason:You must set meta parameter notify=true for your "master" 
resource
  crm_resource: Error performing operation: Not installed

  # This fails on a different mandatory setup
  # OCF_RESKEY_CRM_meta_notify=1  \
crm_resource --validate   \
 --class ocf --agent pgsqlms --provider heartbeat \
 --option pgdata=/var/lib/pgsql/15/data   \
 --option bindir=/usr/pgsql-15/bin
  Operation validate (ocf:heartbeat:pgsqlms) returned 5 (not installed:
You must set meta parameter master-max=1 for your "master" resource)
  ocf-exit-reason:You must set meta parameter master-max=1 for your "master"
resource
  crm_resource: Error performing operation: Not installed

  # This succeed
  # OCF_RESKEY_CRM_meta_notify=1  \
OCF_RESKEY_CRM_meta_master_max=1  \
crm_resource --validate   \
 --class ocf --agent pgsqlms --provider heartbeat \
 --option pgdata=/var/lib/pgsql/15/data   \
 --option bindir=/usr/pgsql-15/bin
  Operation validate (ocf:heartbeat:pgsqlms) returned 0 (ok)

>>> As defined in OCF 1.1, only instance attributes
>>> matter for validation, see
>>> https://github.com/ClusterLabs/OCF-spec/blob/main/ra/1.1/resource-agent-api.md#check-levels
>>
>> It doesn't state clearly that meta 

[ClusterLabs] fence-agents v4.12.0

2023-01-09 Thread Oyvind Albrigtsen

ClusterLabs is happy to announce fence-agents v4.12.0.

The source code is available at:
https://github.com/ClusterLabs/fence-agents/releases/tag/v4.12.0

The most significant enhancements in this release are:
- new fence agents:
 - fence_ecloud

- bugfixes and enhancements:
 - all agents: unify ssl parameters to avoid having to use --ssl when using 
--ssl-secure/--ssl-insecure for some agents
 - build: add FENCETMPDIR for state files
 - build: dont rm PKG_CHECK_VAR.m4 when running maintainer-clean
 - build: fix parallel build of lib/
 - build: make xml-check: ignore detected paths in *_file parameters not 
matching saved metadata
 - configure: check for google-auth instead of deprecated oauth2client
 - fencing: add ability to set bool parameters to 0 or false
 - fencing: add plug_separator parameter to be able to specify one that isnt 
part of the plug name(s)
 - fencing: add source_env()
 - spec: fix python3-suds dependency having changed name on opensuse 16+
 - fence_apc/fence_ilo_moonshot: add missing "import logging"
 - fence_apc: add support for firmware version 7 #475
 - fence_cdu: add 8i support (#471)
 - fence_gce: add httplib2 to try/except: pass
 - fence_gce: add timeouts and failure options (#458)
 - fence_gce: add user agent to API requests (#491)
 - fence_gce: inform that SSLError might be caused by old versions of httplib2
 - fence_gce: make zone optional for get_nodes_list (#487)
 - fence_ibm_powervs: add support for proxy, private API servers and get token 
via API key (#490)
 - fence_ibm_powervs: improve defaults based on testing
 - fence_ibm_vpc: add proxy support
 - fence_ibm_vpc: add token cache support
 - fence_ibm_vpc: remove unused "instance" parameter and make limit optional
 - fence_ibmz: add option --load-on-activate
 - fence_kubevirt: take default namespace from context
 - fence_lpar: fix missing import logging, use fail_usage
 - fence_lpar: only output additional error output on DEBUG level
 - fence_lpar: support comanaged LPARs
 - fence_openstack: add --ssl-insecure
 - fence_openstack: add support for reading config from clouds.yaml and openrc
 - fence_openstack: allowing using base os ssl cacert when cacert is not 
specified
 - fence_raritan: also allow pure port number, not only system1/outletX string 
(#473)
 - fence_sbd: improve error handling
 - fence_scsi/fence_mpath: add suppress-errors option (#484)
 - fence_virt: clarify usage of ip= for vsock listener
 - fence_virt: add debug print for static map check
 - fence_virt: add note that reboot-action doesnt power on nodes that are 
powered off
 - fence_virt: allow groups to only specify vm_name without UUID
 - fence_virt: drop last qmf bits (rhel6 era)
 - fence_virt: fix clang build
 - fence_virt: fix cpg plugin build
 - fence_virt: fix serial debug output
 - fence_virt: fix tcp plugin to properly pass info to acl check
 - fence_virt: update man page for serial listener in serial mode
 - fence_virt: update man page to cover all serial listener configs
 - fence_virtd: add info about using multiple uuid/ip entries for groups
 - fence_virtd: add link and non-user socket example to man page
 - fence_virtd: add support for named groups
 - fence_virtd: set secure file permissions for fence_virtd.conf and key file 
if they are not mode 600
 - fence_wti: increase login timeout to avoid random timeouts
 - fence_zvm: deprecate agent
 - fence_zvmip: add --disable-ssl
 - fence_zvmip: show unable to connect error instead of full stacktrace, e.g. 
when not using --ssl for SSL devices

The full list of changes for fence-agents is available at:
https://github.com/ClusterLabs/fence-agents/compare/v4.11.0...v4.12.0

Everyone is encouraged to download and test the new release.
We do many regression tests and simulations, but we can't cover all
possible use cases, so your feedback is important and appreciated.

Many thanks to all the contributors to this release.


Best,
The fence-agents maintainers

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] announcement: schedule for resource-agents release 4.12.0

2023-01-09 Thread Oyvind Albrigtsen

Hi,

This is a tentative schedule for resource-agents v4.12.0:
4.12.0-rc1: Jan 18.
4.12.0: Jan 25

Full list of changes:
https://github.com/ClusterLabs/resource-agents/compare/v4.11.0...main

I've modified the corresponding milestones at:
https://github.com/ClusterLabs/resource-agents/milestones

If there's anything you think should be part of the release
please open an issue, a pull request, or a bugzilla, as you see
fit.

If there's anything that hasn't received due attention, please
let us know.

Finally, if you can help with resolving issues consider yourself
invited to do so. There are currently 141 issues and 50 pull
requests still open.


Cheers,
Oyvind Albrigtsen

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/developers

ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/