[ClusterLabs] Antw: Re: Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 08.01.2019 um 18:28 in
Nachricht
<2ecefc63baa56a76a6eeca7c696fc7a1653eb620.ca...@redhat.com>:
> On Tue, 2019-01-08 at 17:23 +0100, Kristoffer Grönlund wrote:
>> On Tue, 2019-01-08 at 10:07 -0600, Ken Gaillot wrote:
>> > On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote:
>> > > On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
>> > > > 
>> > > 
>> > > Having all the tools able to produce XML output like cibadmin and
>> > > crm_mon would be good in general, I think. So that seems like a
>> > > good
>> > > proposal to me.
>> > > 
>> > > In the case of an error, at least in my experience just getting a
>> > > return code and stderr output is enough to make sense of it -
>> > > getting
>> > > XML on stderr in the case of an error wouldn't seem like
>> > > something
>> > > that
>> > > would add much value to me.
>> > 
>> > There are two benefits: it can give extended information (such as
>> > the
>> > text string that corresponds to a numeric exit status), and because
>> > it
>> > would also be used by any future REST API (which won't have
>> > stderr),
>> > API/CLI output could be parsed identically.
>> > 
>> 
>> Hm, am I understanding you correctly:
>> 
>> My sort-of vision for implementing a REST API has been to move all of
>> the core functionality out of the command line tools and into the C
>> libraries (I think we discussed something like a libpacemakerclient
>> before) - the idea is that the XML output would be generated on that
>> level?
>> 
>> If so, that is something that I am all for :)
> 
> Yes :) but this would not be an implementation, just an output format
> that a future implementation could use.
> 
> The idea for the future is that a C library would contain all the
> functionality, and the CLI tools and REST API daemon would just be thin
> wrappers for that. Both the CLI (with XML output option) and REST API
> would return identical output, so scripts/tools could be written that
> could easily use one or the other.

If the REST API should implement a reliable interface, the output (actually
the data being returned) must be formalized ("machine readable"). One part
could be a clear-text message intended for humans, but there must be more than
that.

(Recently I learned some commercial REST API interface that seemed to be
"knitted with a hot needle": It is highly inconsistent on parameter names, on
parameter passing methods, on parameter and response formats. Also the methods
used (GET/POST/DELETE) are used inconsitently. I mean this is exactly what we
DO NOT want).

> 
> But at this point, we're just talking about the output format, and
> implementing it for a few CLI commands as a demonstration. The first
> step in the journey.

One point not discussed so far (but important): How would authentication
between CLI tools and the REST API work? With authentication, none of the
user-side tools would have to run as root.

> 
>> Right now, we are experimenting with a REST API based on taking what
>> we
>> use in Hawk and moving that into an API server written in Go, and
>> just
>> calling crm_mon --as-xml to get status information that can be
>> exposed
>> via the API. Having that available in C directly and not having to
>> call
>> out to command line tools would be great and a lot cleaner:

Before going that way, I would revalidate the API with some of the criteria
mentioned above. A consistent, reliable and secure API is very much desired.

>> 
>> https://github.com/krig/hawk-apiserver 
>> https://github.com/hawk-ui/hawk-web-client 
>> 
>> Cheers,
>> Kristoffer
> 
> So it looks like you're using JSON for the results returned by an API
> query? That's part of the question here. I think we're more likely to
> go with XML, but you're already parsing XML from crm_mon, so that
> shouldn't pose any problems if you want to parse and reformat the CLI
> output.

Actually a long as XML is used just to represent nested data (without a clean
DTD), JSON is preferrable, because it's easier to parse. Maybe the client could
specify in the accept header with format it wants to receive...

> 
> I envision a future pacemaker API server offering multiple output
> formats based on request extension, e.g. /GET/resources/my-rsc-id.xml
> would return XML whereas my-rsc-id.json would return JSON, optionally
> with an additional .bz2 extension to compress the result. But that's
> all just dreaming at this point, and there are no concrete plans to
> implement it.

No: HTTP has a standard mechanism for the returned data type; don't add
multiple URIs to specify the data type to return. That'll be the start of a
mess otherwise.

> 
> The questions at this point are:
> 
> 1. What output format(s) should we offer? (likely just XML for now,
> with a design allowing for alternatives just as JSON in the future)
> 
> 2. What should the command-line interface be? (e.g. --output-as=xml --
> output-file=result.xml)

Command-line interface for what? The output of the command-line tools should

[ClusterLabs] Antw: Re: Trying to understand the default action of a fence agent

2019-01-08 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 08.01.2019 um 17:55 in
Nachricht
:
> On Tue, 2019‑01‑08 at 07:35 ‑0600, Bryan K. Walton wrote:
>> Hi,
>> 
>> I'm building a two node cluster with Centos 7.6 and DRBD.  These
>> nodes
>> are connected upstream to two Brocade switches.  I'm trying to enable
>> fencing by using Digimer's fence_dlink_snmp script (
>> https://github.com/digimer/fence_dlink_snmp ).
>> 
>> I've renamed the script to fence_brocade_snmp and have 
>> created my stonith resources using the following syntax:
>> 
>> pcs ‑f stonith_cfg stonith create fenceStorage1‑centipede \
>> fence_brocade_snmp pcmk_host_list=storage1‑drbd ipaddr=10.40.1.1 \
>> community=xxx port=193 pcmk_off_action="off" \
>> pcmk_monitor_timeout=120s 
> 
> FYI pcmk_off_action="off" is the default
> 
> If you want the cluster to request an "off" command instead of a
> "reboot" when fencing a node, set the stonith‑action cluster property
> to "off".
> 
>> When I run "stonith‑admin storage1‑drbd", from my other node, 
>> the switch ports do not get disabled.  However, when I run
>> "stonith_admin ‑F storage1‑drbd", the switch ports DO get disabled.
> 
> The pacemaker CLI commands don't always say anything when invalid
> option combinations are used (fixing that is one of the many things on
> the "would‑be‑nice" list). Your first command does nothing (and the

IMHO it should be on the "reqired" list ;-)
I my C programs I usually create an error message, set some flag, and continue
parsing the rest of the options (just to flag as many errors as possible, not
just the first one), then I quit if errors were found. Then I'll check the
parameters set so far for errors, If so, I'll quit. The remaining erros are not
syntax errors, then...

...option processing
bad_arg(option, optarg);
result = 1;
...more options
if ( result != 0 )
{
if ( result == 1 )
usage();
goto quit;
}
...check parameters and initialize
if ( result != 0 )
goto quit;
...now start doing something useful
...

Regards,
Ulrich

> forthcoming 2.0.1 version will give a usage error). The second command
> is the correct command to fence a node (‑F for "off" or ‑B for
> "reboot").
> 
>> If I run "pcs stonith fence storage1‑drbd", from the other node, the
>> response is: "Node: storage1‑drbd fenced", but, again, the switch
>> ports
>> are still enabled.  I'm forced to instead run: "pcs stonith fence
>> storage1‑drbd ‑‑off" to get the ports to be disabled.
> 
> I believe "pcs stonith fence" by default sends a reboot command, so it
> sounds like your fence agent doesn't implement reboot (or implements it
> incorrectly, or perhaps a reboot is equivalent to disabling the ports
> then re‑enabling them and so is not useful). I'd use stonith‑action=off 
> as mentioned above.
> 
>> What I'm trying to figure out, is under what scenario should I see
>> the
>> ports actually get disabled?  My concern is that, for example, I can
>> stop the cluster on storage1‑drbd, and the logs will show that the
>> fencing was successful, and then my resources get moved.  But when I
>> check on the switch ports that are connected to storage1‑drbd, they
>> are
>> still enabled.  So, the node does not appear to be really fenced. 
>> 
>> Do I need to create my stonith resource differently to actually
>> disable
>> those ports?
>> 
>> Thank you for your time.  I am greatly appreciative.
>> 
>> Sincerely,
>> Bryan Walton
>> 
>> 
>> ‑‑ 
>> Bryan K. Walton   319‑337‑
>> 3877 
>> Linux Systems Administrator Leepfrog Technologies,
>> Inc 
>> 
>> ‑ End forwarded message ‑
>> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pcs 0.10.1 released

2019-01-08 Thread Ivan Devát




On 12/28/18 5:39 AM, digimer wrote:

On 2018-11-26 12:26 p.m., Tomas Jelinek wrote:

I am happy to announce the latest release of pcs, version 0.10.1.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.1.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.1.zip

This is the first final release of the pcs-0.10 branch.
Pcs-0.10 is the new main pcs branch supporting Corosync 3.x and
Pacemaker 2.x clusters while dropping support for older Corosync and
Pacemaker versions. Pcs-0.9, being in maintenance mode, continues to
support Corosync 1.x/2.x and Pacemaker 1.x clusters.

Main changes compared to 0.9 branch:
* Corosync 3.x and Kronosnet is supported while Corosync 2.x and older
  as well as CMAN are not
* Node names are now fully supported
* Pacemaker 2.x is supported while Pacemaker 1.x is not
* Promotable clone resources replaced master resources; creating master
  resources is no longer possible but managing existing master resources
  is supported
* Options starting with '-' and '--' are no longer accepted by commands
  for which those options have no effect
* Obsoleting parameters of resource and fence agents are now supported
  and preferred over deprecated parameters
* Several deprecated and / or undocumented pcs commands / options have
  been removed
* Python 3.6+ and Ruby 2.2+ is now required

Complete change log for this release against 0.9.163:
## [0.10.1] - 2018-11-23

### Removed
- Pcs-0.10 removes support for CMAN, Corosync 1.x, Corosync 2.x and
  Pacemaker 1.x based clusters. For managing those clusters use
  pcs-0.9.x.
- Pcs-0.10 requires Python 3.6 and Ruby 2.2, support for older Python
  and Ruby versions has been removed.
- `pcs resource failcount reset` command has been removed as `pcs
  resource cleanup` is doing exactly the same job. ([rhbz#1427273])
- Deprecated commands `pcs cluster remote-node add | remove` have been
  removed as they were replaced with `pcs cluster node add-guest |
  remove-guest`
- Ability to create master resources has been removed as they are
  deprecated in Pacemaker 2.x ([rhbz#1542288])
  - Instead of `pcs resource create ... master` use `pcs resource create
    ... promotable` or `pcs resource create ... clone promotable=true`
  - Instead of `pcs resource master` use `pcs resource promotable` or
    `pcs resource clone ... promotable=true`
- Deprecated --clone option from `pcs resource create` command
- Ability to manage node attributes with `pcs property set|unset|show`
  commands (using `--node` option). The same functionality is still
  available using `pcs node attribute` command.
- Undocumented version of the `pcs constraint colocation add` command,
  its syntax was `pcs constraint colocation add 
   [score] [options]`
- Deprecated commands `pcs cluster standby | unstandby`, use `pcs node
  standby | unstandby` instead
- Deprecated command `pcs cluster quorum unblock` which was replaced by
  `pcs quorum unblock`
- Subcommand `pcs status groups` as it was not showing a cluster status
  but cluster configuration. The same functionality is still available
  using command `pcs resource group list`
- Undocumented command `pcs acl target`, use `pcs acl user` instead

### Added
- Validation for an unaccessible resource inside a bundle
  ([rhbz#1462248])
- Options to filter failures by an operation and its interval in `pcs
  resource cleanup` and `pcs resource failcount show` commands
  ([rhbz#1427273])
- Commands for listing and testing watchdog devices ([rhbz#1578891])
- Commands for creating promotable clone resources `pcs resource
  promotable` and `pcs resource create ... promotable` ([rhbz#1542288])
- `pcs resource update` and `pcs resource meta` commands change master
  resources to promotable clone resources because master resources are
  deprecated in Pacemaker 2.x ([rhbz#1542288])
- Support for the `promoted-max` bundle option replacing the `masters`
  option in Pacemaker 2.x ([rhbz#1542288])
- Support for OP_NO_RENEGOTIATION option when OpenSSL supports it
  (even with Python 3.6) ([rhbz#1566430])
- Support for container types `rkt` and `podman` into bundle commands
  ([rhbz#1619620])
- Support for promotable clone resources in pcsd and web UI
  ([rhbz#1542288])
- Obsoleting parameters of resource and fence agents are now supported
  and preferred over deprecated parameters ([rhbz#1436217])
- `pcs status` now shows failed and pending fencing actions and `pcs
  status --full` shows the whole fencing history. Pacemaker supporting
  fencing history is required. ([rhbz#1615891])
- `pcs stonith history` commands for displaying, synchronizing and
  cleaning up fencing history. Pacemaker supporting fencing history is
  required. ([rhbz#1620190])
- Validation of node existence in a cluster when creating location
  constraints ([rhbz#1553718])
- Command `pcs client local-auth` for authentication of pcs client
  against local pcsd. This is required when a non-root user wants to
  execute a command which requires root 

Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Bryan K. Walton
On Tue, Jan 08, 2019 at 10:55:09AM -0600, Ken Gaillot wrote:
> 
> FYI pcmk_off_action="off" is the default
> 
> If you want the cluster to request an "off" command instead of a
> "reboot" when fencing a node, set the stonith-action cluster property
> to "off".

Awesome! Thank you, Ken.  I don't know how I've missed this, up to now.
Setting this property is exactly what I needed.

Much obliged,
Bryan

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Ken Gaillot
On Tue, 2019-01-08 at 17:23 +0100, Kristoffer Grönlund wrote:
> On Tue, 2019-01-08 at 10:07 -0600, Ken Gaillot wrote:
> > On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote:
> > > On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> > > > 
> > > 
> > > Having all the tools able to produce XML output like cibadmin and
> > > crm_mon would be good in general, I think. So that seems like a
> > > good
> > > proposal to me.
> > > 
> > > In the case of an error, at least in my experience just getting a
> > > return code and stderr output is enough to make sense of it -
> > > getting
> > > XML on stderr in the case of an error wouldn't seem like
> > > something
> > > that
> > > would add much value to me.
> > 
> > There are two benefits: it can give extended information (such as
> > the
> > text string that corresponds to a numeric exit status), and because
> > it
> > would also be used by any future REST API (which won't have
> > stderr),
> > API/CLI output could be parsed identically.
> > 
> 
> Hm, am I understanding you correctly:
> 
> My sort-of vision for implementing a REST API has been to move all of
> the core functionality out of the command line tools and into the C
> libraries (I think we discussed something like a libpacemakerclient
> before) - the idea is that the XML output would be generated on that
> level?
> 
> If so, that is something that I am all for :)

Yes :) but this would not be an implementation, just an output format
that a future implementation could use.

The idea for the future is that a C library would contain all the
functionality, and the CLI tools and REST API daemon would just be thin
wrappers for that. Both the CLI (with XML output option) and REST API
would return identical output, so scripts/tools could be written that
could easily use one or the other.

But at this point, we're just talking about the output format, and
implementing it for a few CLI commands as a demonstration. The first
step in the journey.

> Right now, we are experimenting with a REST API based on taking what
> we
> use in Hawk and moving that into an API server written in Go, and
> just
> calling crm_mon --as-xml to get status information that can be
> exposed
> via the API. Having that available in C directly and not having to
> call
> out to command line tools would be great and a lot cleaner:
> 
> https://github.com/krig/hawk-apiserver
> https://github.com/hawk-ui/hawk-web-client
> 
> Cheers,
> Kristoffer

So it looks like you're using JSON for the results returned by an API
query? That's part of the question here. I think we're more likely to
go with XML, but you're already parsing XML from crm_mon, so that
shouldn't pose any problems if you want to parse and reformat the CLI
output.

I envision a future pacemaker API server offering multiple output
formats based on request extension, e.g. /GET/resources/my-rsc-id.xml
would return XML whereas my-rsc-id.json would return JSON, optionally
with an additional .bz2 extension to compress the result. But that's
all just dreaming at this point, and there are no concrete plans to
implement it.

The questions at this point are:

1. What output format(s) should we offer? (likely just XML for now,
with a design allowing for alternatives just as JSON in the future)

2. What should the command-line interface be? (e.g. --output-as=xml --
output-file=result.xml)

3. What should the output schema look like? For example, how should we
represent the output of "stonith_admin --history" and "crm_resource --
list-operations"? The goal will be to have something general enough
that it can be easily adapted to any command, yet consistent enough to
be easily parsed.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Christopher Lumens
> The pacemaker CLI commands don't always say anything when invalid
> option combinations are used (fixing that is one of the many things on
> the "would-be-nice" list).

If you run into a nonsense option combination, file a bug and CC me on it
and I'll take a look at it.  I've been adding options here and there and
tried to make sure my new options error out as appropriate.  I'd like to
take care of existing options, too, but there's so many to just wade in
and tackle them all.

- Chris
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Ken Gaillot
On Tue, 2019-01-08 at 07:35 -0600, Bryan K. Walton wrote:
> Hi,
> 
> I'm building a two node cluster with Centos 7.6 and DRBD.  These
> nodes
> are connected upstream to two Brocade switches.  I'm trying to enable
> fencing by using Digimer's fence_dlink_snmp script (
> https://github.com/digimer/fence_dlink_snmp ).
> 
> I've renamed the script to fence_brocade_snmp and have 
> created my stonith resources using the following syntax:
> 
> pcs -f stonith_cfg stonith create fenceStorage1-centipede \
> fence_brocade_snmp pcmk_host_list=storage1-drbd ipaddr=10.40.1.1 \
> community=xxx port=193 pcmk_off_action="off" \
> pcmk_monitor_timeout=120s 

FYI pcmk_off_action="off" is the default

If you want the cluster to request an "off" command instead of a
"reboot" when fencing a node, set the stonith-action cluster property
to "off".

> When I run "stonith-admin storage1-drbd", from my other node, 
> the switch ports do not get disabled.  However, when I run
> "stonith_admin -F storage1-drbd", the switch ports DO get disabled.

The pacemaker CLI commands don't always say anything when invalid
option combinations are used (fixing that is one of the many things on
the "would-be-nice" list). Your first command does nothing (and the
forthcoming 2.0.1 version will give a usage error). The second command
is the correct command to fence a node (-F for "off" or -B for
"reboot").

> If I run "pcs stonith fence storage1-drbd", from the other node, the
> response is: "Node: storage1-drbd fenced", but, again, the switch
> ports
> are still enabled.  I'm forced to instead run: "pcs stonith fence
> storage1-drbd --off" to get the ports to be disabled.

I believe "pcs stonith fence" by default sends a reboot command, so it
sounds like your fence agent doesn't implement reboot (or implements it
incorrectly, or perhaps a reboot is equivalent to disabling the ports
then re-enabling them and so is not useful). I'd use stonith-action=off 
as mentioned above.

> What I'm trying to figure out, is under what scenario should I see
> the
> ports actually get disabled?  My concern is that, for example, I can
> stop the cluster on storage1-drbd, and the logs will show that the
> fencing was successful, and then my resources get moved.  But when I
> check on the switch ports that are connected to storage1-drbd, they
> are
> still enabled.  So, the node does not appear to be really fenced. 
> 
> Do I need to create my stonith resource differently to actually
> disable
> those ports?
> 
> Thank you for your time.  I am greatly appreciative.
> 
> Sincerely,
> Bryan Walton
> 
> 
> -- 
> Bryan K. Walton   319-337-
> 3877 
> Linux Systems Administrator Leepfrog Technologies,
> Inc 
> 
> - End forwarded message -
> 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] dlm_controld does not recover from failed lockspace join

2019-01-08 Thread Edwin Török
Hello,

We've seen an issue in production where DLM 4.0.7 gets "stuck" and
unable to join more lockspaces. Other nodes in the cluster were able to
join new lockspaces, but not the one that node 1 was stuck on.
GFS2 was unaffected (the "stuck" lockspace was for a userspace control
daemon, but thats just luck, it could've been GFS2's lockspace).

I do not have repro steps for this yet, but analyzing the kernel and
dlm_controld logs I think I found the root cause:

dlm_controld[7104]: 10998 fence work wait for quorum
dlm_controld[7104]: 11000 xapi-clusterd-lockspace wait for quorum
dlm_controld[7104]: 14602 fence work wait for quorum
dlm_controld[7104]: 14604 xapi-clusterd-lockspace wait for quorum
[15419.173125] dlm: xapi-clusterd-lockspace: group event done -512 0
[15419.173135] dlm: xapi-clusterd-lockspace: group join failed -512 0
dlm_controld[7104]: 15366 process_uevent online@ error -17 errno 0
...
[16080.892629] dlm: xapi-clusterd-lockspace: group event done -512 0
[16080.892638] dlm: xapi-clusterd-lockspace: group join failed -512 0
[16080.893156] dlm: cannot start dlm_scand thread -4
dlm_controld[7104]: 16087 cdab491e-14c8-ab wait for quorum
...
dlm_controld[7104]: 18199 fence work wait for quorum
dlm_controld[7104]: 18201 xapi-clusterd-lockspace wait for quorum
[19551.164358] dlm: xapi-clusterd-lockspace: joining the lockspace group...
dlm_controld[7104]: 19320 open
"/sys/kernel/dlm/xapi-clusterd-lockspace/id" error -1 2
dlm_controld[7104]: 19320 open
"/sys/kernel/dlm/xapi-clusterd-lockspace/control" error -1 2
dlm_controld[7104]: 19320 open
"/sys/kernel/dlm/xapi-clusterd-lockspace/event_done" error -1 2
dlm_controld[7104]: 19321 open
"/sys/kernel/dlm/xapi-clusterd-lockspace/control" error -1 2
dlm_controld[7104]: 19321 open
"/sys/kernel/dlm/xapi-clusterd-lockspace/control" error -1 2
dlm_controld[7104]: 19495 process_uevent online@ error -17 errno 2
...
[19551.455848] dlm: invalid lockspace 2844031955 from 2 cmd 2 type 1
[19552.459852] dlm: invalid lockspace 2844031955 from 2 cmd 2 type 1

And on another host from the cluster:
[41373.794149] dlm: xapi-clusterd-lockspace: remote node 1 not ready

Errno 512 is ERESTARTSYS in the kernel.
errno 17 is EEXIST, and looking through the source code it looks like it
is raised here in main.c:
if (!strcmp(act, "online@")) {


  >   >   ls = find_ls(argv[3]);


  >   >   if (ls) {


  >   >   >   rv = -EEXIST;


  >   >   >   goto out;


  >   >   }

find_ls() looks at a global  variable, which AFAICT is only
ever added to, but never removed from:
dlm_controld/cpg.c: list_for_each_entry(ls, , list) {
dlm_controld/cpg.c: list_for_each_entry(ls, , list) {
dlm_controld/cpg.c: list_for_each_entry_safe(ls, safe, ,
list) {
dlm_controld/cpg.c: list_add(>list, );
dlm_controld/cpg.c: list_for_each_entry(ls, , list)
dlm_controld/cpg.c: list_for_each_entry(ls, , list) {
dlm_controld/daemon_cpg.c:  list_for_each_entry(ls, , list) {
dlm_controld/dlm_daemon.h:EXTERN struct list_head lockspaces;
dlm_controld/main.c:list_for_each_entry(ls, , list) {
dlm_controld/main.c:list_for_each_entry(ls, , list) {
dlm_controld/main.c:if (daemon_quit &&
list_empty()) {
dlm_controld/main.c:list_for_each_entry(ls, , list)
dlm_controld/member.c:  if (list_empty()) {
dlm_controld/plock.c:   list_for_each_entry(ls, , list) {

So if joining the lockspace fails, then DLM would forever refuse to join
that lockspace (until the host is rebooted): DLM's list of lockspaces is
now out of sync with the kernel.

How should dlm_controld recover from such an error? Could it refresh its
list of lockspaces from the kernel if a join failed?

Best regards,
--Edwin

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Kristoffer Grönlund
On Tue, 2019-01-08 at 10:07 -0600, Ken Gaillot wrote:
> On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote:
> > On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> > > 
> > Having all the tools able to produce XML output like cibadmin and
> > crm_mon would be good in general, I think. So that seems like a
> > good
> > proposal to me.
> > 
> > In the case of an error, at least in my experience just getting a
> > return code and stderr output is enough to make sense of it -
> > getting
> > XML on stderr in the case of an error wouldn't seem like something
> > that
> > would add much value to me.
> 
> There are two benefits: it can give extended information (such as the
> text string that corresponds to a numeric exit status), and because
> it
> would also be used by any future REST API (which won't have stderr),
> API/CLI output could be parsed identically.
> 

Hm, am I understanding you correctly:

My sort-of vision for implementing a REST API has been to move all of
the core functionality out of the command line tools and into the C
libraries (I think we discussed something like a libpacemakerclient
before) - the idea is that the XML output would be generated on that
level?

If so, that is something that I am all for :)

Right now, we are experimenting with a REST API based on taking what we
use in Hawk and moving that into an API server written in Go, and just
calling crm_mon --as-xml to get status information that can be exposed
via the API. Having that available in C directly and not having to call
out to command line tools would be great and a lot cleaner:

https://github.com/krig/hawk-apiserver
https://github.com/hawk-ui/hawk-web-client

Cheers,
Kristoffer

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] About the pacemaker

2019-01-08 Thread Ken Gaillot
On Tue, 2019-01-08 at 15:27 +0800, T. Ladd Omar wrote:
> Hey guys. I have a question, if the Pacemaker has an event-notify
> interface which is realized by push Recently I want to do
> something extra using other process when the resources being started
> or deleted. So I need a way to monitor the resources events.
> ClusterMon and alerts both use external-scripts for extra actions,
> but in my situation, the specific process might have not being
> started. I hope pacemaker itself could store the old events and flush
> them for updating until the specific process starts and subscribe to
> Pacemaker, then pull all the old events. Also the Pacemaker could
> push to it when new events come.

I would use alerts with alert_file.sh (with custom modifications if
desired) to record them to a file, then have your process look at that.
(Tip: if you only care about events since the last boot, put the file
in /run so you don't have to worry about cleaning it up.)

> Above is all what I thought, maybe it is not accurate. Anyway, I need
> some advice.
> By the way, there is no deletion notify in ClusterMon and alerts,
> right ?

Correct, configuration changes are not alerted. The only way I know of
to get configuration changes is to use the C API for update/replace
callbacks. It would also be possible to poll the configuration at
intervals and use crm_diff to compare them, but that's probably not any
easier.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Ken Gaillot
On Tue, 2019-01-08 at 10:30 +0100, Kristoffer Grönlund wrote:
> On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> > There has been some discussion in the past about generating more
> > machine-friendly output from pacemaker CLI tools for scripting and
> > high-level interfaces, as well as possibly adding a pacemaker REST
> > API.
> > 
> > I've filed an RFE BZ
> > 
> >  https://bugs.clusterlabs.org/show_bug.cgi?id=5376
> > 
> > to design an output interface that would suit these goals. An
> > actual
> > REST API is not planned at this point, but this would provide a key
> > component of any future implementation.
> 
> Having all the tools able to produce XML output like cibadmin and
> crm_mon would be good in general, I think. So that seems like a good
> proposal to me.
> 
> In the case of an error, at least in my experience just getting a
> return code and stderr output is enough to make sense of it - getting
> XML on stderr in the case of an error wouldn't seem like something
> that
> would add much value to me.

There are two benefits: it can give extended information (such as the
text string that corresponds to a numeric exit status), and because it
would also be used by any future REST API (which won't have stderr),
API/CLI output could be parsed identically.

> Cheers,
> Kristoffer
> 
> > 
> > The question is what machine-friendly output should look like. The
> > basic idea is: for commands like "crm_resource --constraints" or
> > "stonith_admin --history", what output format would be most useful
> > for
> > a GUI or other program to parse?
> > 
> > Suggestions welcome here and/or on the bz ...

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Plea for a new DLM release and thoughts on its Pacemaker interface

2019-01-08 Thread David Teigland
On Tue, Jan 08, 2019 at 11:56:18AM +0100, wf...@niif.hu wrote:
> Hi David,
> 
> The DLM git repo accumulated a couple of patches over the 4.0.7 tag,
> would you mind cutting a new release for packaging?

ok

> Tangentially, would you be interested in an Autotoolized build system?
> The flag handling repeatedly gives me headaches, and I'd consider
> contributing that if you aren't opposed.

Thank you, but I'll decline; I really don't care for autotool building.

> And if DLM development continues with the current code base -- you
> mentioned new developments for recovering from network message loss, so
> I expect changes..?

I don't know what the status of that is, but it would mostly be kernel
changes.

> And once we're at it: the STONITH helper embedding to SONAME of the
> Pacemaker fencing library is pretty obscure and easy to miss on library
> transitions.  I guess it's done to avoid unconditionally pulling in
> Pacemaker libraries and their dependencies.  Don't you think this helper
> agent had better be a part of the Pacemaker CLI utilities instead?  In
> my opinion the ABI coupling is stronger than the textual interface
> dlm_controld uses.  But even if not, it would be possible to dynamically
> link the agent against the Pacemaker libraries and split it into a
> separate optional package to make all those dependencies avoidable on
> systems not using Pacemaker fencing.  If all parties agree, of course.

I wasn't aware of the difficulties.  I'm open to suggestions as long as
dlm_controld has a fence-agent-style binary to call.

Dave
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Trying to understand the default action of a fence agent

2019-01-08 Thread Bryan K. Walton
Hi,

I'm building a two node cluster with Centos 7.6 and DRBD.  These nodes
are connected upstream to two Brocade switches.  I'm trying to enable
fencing by using Digimer's fence_dlink_snmp script (
https://github.com/digimer/fence_dlink_snmp ).

I've renamed the script to fence_brocade_snmp and have 
created my stonith resources using the following syntax:

pcs -f stonith_cfg stonith create fenceStorage1-centipede \
fence_brocade_snmp pcmk_host_list=storage1-drbd ipaddr=10.40.1.1 \
community=xxx port=193 pcmk_off_action="off" \
pcmk_monitor_timeout=120s 

When I run "stonith-admin storage1-drbd", from my other node, 
the switch ports do not get disabled.  However, when I run
"stonith_admin -F storage1-drbd", the switch ports DO get disabled.

If I run "pcs stonith fence storage1-drbd", from the other node, the
response is: "Node: storage1-drbd fenced", but, again, the switch ports
are still enabled.  I'm forced to instead run: "pcs stonith fence
storage1-drbd --off" to get the ports to be disabled.

What I'm trying to figure out, is under what scenario should I see the
ports actually get disabled?  My concern is that, for example, I can
stop the cluster on storage1-drbd, and the logs will show that the
fencing was successful, and then my resources get moved.  But when I
check on the switch ports that are connected to storage1-drbd, they are
still enabled.  So, the node does not appear to be really fenced. 

Do I need to create my stonith resource differently to actually disable
those ports?

Thank you for your time.  I am greatly appreciative.

Sincerely,
Bryan Walton


-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 

- End forwarded message -

-- 
Bryan K. Walton   319-337-3877 
Linux Systems Administrator Leepfrog Technologies, Inc 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Plea for a new DLM release and thoughts on its Pacemaker interface

2019-01-08 Thread wferi
Hi David,

The DLM git repo accumulated a couple of patches over the 4.0.7 tag,
would you mind cutting a new release for packaging?

Tangentially, would you be interested in an Autotoolized build system?
The flag handling repeatedly gives me headaches, and I'd consider
contributing that if you aren't opposed.  And if DLM development
continues with the current code base -- you mentioned new developments
for recovering from network message loss, so I expect changes..?

And once we're at it: the STONITH helper embedding to SONAME of the
Pacemaker fencing library is pretty obscure and easy to miss on library
transitions.  I guess it's done to avoid unconditionally pulling in
Pacemaker libraries and their dependencies.  Don't you think this helper
agent had better be a part of the Pacemaker CLI utilities instead?  In
my opinion the ABI coupling is stronger than the textual interface
dlm_controld uses.  But even if not, it would be possible to dynamically
link the agent against the Pacemaker libraries and split it into a
separate optional package to make all those dependencies avoidable on
systems not using Pacemaker fencing.  If all parties agree, of course.
-- 
Thanks,
Feri
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Proposal for machine-friendly output from Pacemaker tools

2019-01-08 Thread Kristoffer Grönlund
On Mon, 2019-01-07 at 17:52 -0600, Ken Gaillot wrote:
> There has been some discussion in the past about generating more
> machine-friendly output from pacemaker CLI tools for scripting and
> high-level interfaces, as well as possibly adding a pacemaker REST
> API.
> 
> I've filed an RFE BZ
> 
>  https://bugs.clusterlabs.org/show_bug.cgi?id=5376
> 
> to design an output interface that would suit these goals. An actual
> REST API is not planned at this point, but this would provide a key
> component of any future implementation.

Having all the tools able to produce XML output like cibadmin and
crm_mon would be good in general, I think. So that seems like a good
proposal to me.

In the case of an error, at least in my experience just getting a
return code and stderr output is enough to make sense of it - getting
XML on stderr in the case of an error wouldn't seem like something that
would add much value to me.

Cheers,
Kristoffer

> 
> The question is what machine-friendly output should look like. The
> basic idea is: for commands like "crm_resource --constraints" or
> "stonith_admin --history", what output format would be most useful
> for
> a GUI or other program to parse?
> 
> Suggestions welcome here and/or on the bz ...
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VirtualDomain & parallel shutdown

2019-01-08 Thread Jan Pokorný
On 27/11/18 14:35 +0100, Jan Pokorný wrote:
> On 27/11/18 12:29 +0200, Klecho wrote:
>> Big thanks for the answer, but I in your ways around I don't see a solution
>> for the following simple case:
>> 
>> I have a few VMs (VirtualDomain RA) and just want and to stop a few of them,
>> not all.
>> 
>> While the first VM is shutting down (target-role=stopped), it starts some
>> slow update, which could take hours (because of the possible update case,
>> stop timeout is very big).
>> 
>> During these hours of update, no other VM can be stopped at all.
>> 
>> If this isn't avoidable, this could be a quite big flaw, because it blocks
>> basic functionality.
> 
> It looks like having transition "leaves", i.e. particular executive
> manipulations like stop/start operations, last in order of tens of
> minutes and longer is not what's pacemaker design had in mind,
> as opposed ot pushing asychronicity to the extreme (at the cost
> of complexity of the "orthogonality/non-interference tests",
> I think).

Also note that, moreover, extended periods of time in the context
of executing particular OCF/LSB resource operations can result in
relatively serious troubles under some failure scenarios unless
the agents are written in a self-defensive manner (and carefully
tested in practice):

https://lists.clusterlabs.org/pipermail/users/2019-January/016045.html

-- 
Nazdar,
Jan (Poki)


pgpE_tmqRGt0b.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Stray started resource leakages (Was: [Problem] The crmd fails to connect with pengine.)

2019-01-08 Thread Jan Pokorný
On 02/01/19 15:43 +0100, Jan Pokorný wrote:
> On 28/12/18 05:51 +0900, renayama19661...@ybb.ne.jp wrote:
>> As a result, Pacemaker will stop without stopping the resource.
> 
> This might have serious consequences in some scenarios, perhaps
> unless some watchdog-based solution (SBD?) was used as a fencing
> of choice since it would not get defused just as the resource
> wasn't stopped, I think...

Just very recently, I realized that pacemaker is likely not
sufficiently vigorous, in part for simplicity of design constraints,
in part for neglectation thereof, to prevent any such "stray started
resource" leaks that verge on resource-level split-brains, at least
in theory.

Take, for example, an OCF/LSB resource (hence with just approximated
monitoring capabilities by design) that takes unusually long to start.
What if pacemaker-execd (lrmd) crashes midway to bring it to start,
making the original resource process reparented to PID1?
Pacemakerd will restart this child daemon anew, resources will get
probed, but because the OCF/LSB resource in question is not started
yet (e.g. it double-forks, it does a lengthy initialization in between
the forks, only near the finish line it will create a pid file that is
also the only indicator for the respective monitor operation),
pacemaker on this node indicates to the peers this particular resource
is _not_ running locally, making them free to run it if DC decides
so.  That is, unless the start operation comes with an override of
"on-fail" default if this start-monitor pair would be evaluated as
a failed start at all (I don't know).  But what we are observing now
is an opportunity for resource-level split-brain to emerge; remember,
the resource on the original node, now under PID1's supervision, is
about to finish its initialization any present momement + no more
probe/monitor is coming there (unless explicitly configured so)
to realize this disaster any time soon.

This theoretical observation makes systemd class of resources (putting
nagios and upstart aside now for not having a look at them, and,
perhaps naively, assuming that things like a double-fencing are
relatively harmless -- it's meant to be downright idempotent when
the action is "off", unless it would collide with the parallel manual
intervention, indeed) the only one universally and relatively safely
survivable pacemaker-execd isolated restart (even then, it might be
recommended to have systemd sitting on the ticking watchdog just in
case, since when it internally "asserts", no further actions are
possible till the machine is restarted; indeed, unless pacemaker
can capture this circumstance and panic on its own).

Alternatively, one needs to make sure the OCF/LSB agent's start
operation begins with creating what's usually called a lock file, so
that after-restart probe in such a scenario will spot, in combination
with missing pid file, that the resource is still coming to its start,
give it some time for pid file to actually appear, and if not in time,
preferably trigger panic/self-fencing, since any
getting-hold-of-a-process-by-procfs-scan is a broken approach
(there's no snapshot semantics imposed with POSIX), especially
when there can be containers running on that host.

The other alternative in the current state of affairs and without
having OCF/LSB resources in use properly scrutinized (fact that
they start timely may be sufficient) is declaring PCMK_fail_fast=yes
in /etc/sysconfig/pacemaker or equivalent.

I do apologize beforehand for not having verified these scenarios
by hand, I wish I had a throughput for that.  Sadly, the failure
modes are far from being documented, which is best done along
creating and implementing the design (with a very desirable feedback
loop when running into particular corner cases), without the need
for reverse engineering (reverse grasping of the intentions prone
to misunderstanding) afterwards.

Keep calm, things have always been this way :-)

-- 
Nazdar,
Jan (Poki)


pgp8Caardy9Kx.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org