Re: [ClusterLabs] SLES12 SP4: Same error logged in two different formats

2019-08-26 Thread Jan Pokorný
On 26/08/19 08:16 +0200, Ulrich Windl wrote:
> While inspecting the logs to improve my own RA, I noticed that one
> error is logged with two different formats: With literal "\n" nd
> with a space:
> lrmd[7278]:   notice: prm_isr_ds3_monitor_0:108007:stderr [ mkdir: cannot 
> create directory '/run/isredir': File exists ]
> crmd[7281]:   notice: h11-prm_isr_ds3_monitor_0:964 [ mkdir: cannot create 
> directory '/run/isredir': File exists\n ]
> 
> (The RA suffers from a race condition when multiple monitors are
> launched in parallel, and each one tries to create the missing
> directory; only the first one succeeds)

/me acting as a bot adding synapses would they be helpful:
https://lists.clusterlabs.org/pipermail/users/2019-February/025454.html
[never codified reentrancy of the agent-instances]

> The log message in question is created automatically from a failed
> mkdir in the shell script.
> 
> My guess is that the trailing "\n" should be removed from the
> message.
> 
> As the pacemaker on the machine is a bit outdated
> (pacemaker-1.1.19+20180928.0d2680780-1.8.x86_64), the problem might
> have been fixed already.

-- 
Jan (Poki)


pgp_27uo5NLke.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] On crm shell' "configure": "show changed" for deletes

2019-08-26 Thread Ulrich Windl
Hi!

Working with crm in SLES12 SP4 I noticed that when you delete resources in "crm 
configure", a "show changed" just displays nothing.
How hard would it be to show commands like "delete xyz" for each deleted 
resource?

Regards,
Ulrich




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: Re: node name issues (Could not obtain a node name for corosync nodeid 739512332)

2019-08-26 Thread Ulrich Windl
>>> Andrei Borzenkov  schrieb am 26.08.2019 um 10:52 in
Nachricht
:
> On Mon, Aug 26, 2019 at 9:59 AM Ulrich Windl
>  wrote:
> 
>> Also see my earlier message. If adding the node name to corosync conf is
>> highly recommended, I wonder why SUSE's SLES procedure does not set it...
>>
> 
> If you mean ha‑cluster‑init/ha‑cluster‑join, it just invokes "crm
> cluster", so you may consider creating issue for crmsh pointing to
> this discussion.

I didn't setup that cluster myself, but a co-worker that is on holiday right
now. I'll ask him if he's back...

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: node name issues (Could not obtain a node name for corosync nodeid 739512332)

2019-08-26 Thread Andrei Borzenkov
On Mon, Aug 26, 2019 at 9:59 AM Ulrich Windl
 wrote:

> Also see my earlier message. If adding the node name to corosync conf is
> highly recommended, I wonder why SUSE's SLES procedure does not set it...
>

If you mean ha-cluster-init/ha-cluster-join, it just invokes "crm
cluster", so you may consider creating issue for crmsh pointing to
this discussion.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] pcs 0.10.3 released

2019-08-26 Thread Tomas Jelinek

I am happy to announce the latest release of pcs, version 0.10.3.

Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.10.3.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.10.3.zip

This is a bugfix release resolving issues introduced in pcs-0.10.2.

Complete change log for this release:
## [0.10.3] - 2019-08-23

### Fixed
- Fixed crashes in the `pcs host auth` command ([rhbz#1676957])
- Fixed id conflict with current bundle configuration in `pcs resource
  bundle reset` ([rhbz#1657166])
- Options starting with - and -- are no longer ignored for non-root
  users (broken since pcs-0.10.2) ([rhbz#1725183])
- Fixed crashes when pcs is configured that no rubygems are bundled in
  pcs package ([ghissue#208])
- Standby nodes running resources are listed separately in `pcs status
  nodes`
- Parsing arguments in the `pcs constraint order` and `pcs constraint
  colocation add` commands has been improved, errors which were
  previously silent are now reported ([rhbz#1734361])
- Fixed shebang correction in Makefile ([ghissue#206])
- Generate 256 bytes long corosync authkey, longer keys are not
  supported when FIPS is enabled ([rhbz#1740218])

### Changed
- Command `pcs resource bundle reset` no longer accepts the container
  type ([rhbz#1657166])


Thanks / congratulations to everyone who contributed to this release,
including Ivan Devat, Michal Pospisil, Ondrej Mular and Tomas Jelinek.

Cheers,
Tomas


[ghissue#206]: https://github.com/ClusterLabs/pcs/issues/206
[ghissue#208]: https://github.com/ClusterLabs/pcs/issues/208
[rhbz#1657166]: https://bugzilla.redhat.com/show_bug.cgi?id=1657166
[rhbz#1676957]: https://bugzilla.redhat.com/show_bug.cgi?id=1676957
[rhbz#1725183]: https://bugzilla.redhat.com/show_bug.cgi?id=1725183
[rhbz#1734361]: https://bugzilla.redhat.com/show_bug.cgi?id=1734361
[rhbz#1740218]: https://bugzilla.redhat.com/show_bug.cgi?id=1740218
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: node name issues (Could not obtain a node name for corosync nodeid 739512332)

2019-08-26 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 22.08.2019 um 17:38 in
Nachricht
<9bd1f0de082ff66a2a9b14d1d80cc95d7eff4bac.ca...@redhat.com>:
> On Thu, 2019‑08‑22 at 09:07 +0200, Ulrich Windl wrote:
>> Hi!
>> 
>> When starting pacemaker (1.1.19+20181105.ccd6b5b10‑3.10.1) on a node
>> that had been down for a while, I noticed some unexpected messages
>> about the node name:
>> 
>> pacemakerd:   notice: get_node_name:   Could not obtain a node name
>> for corosync nodeid 739512332
>> pacemakerd: info: crm_get_peer:Created entry a21bf687‑045b‑
>> 4fd7‑9340‑0562ef595883/0x18752f0 for node (null)/739512332 (1 total)
>> pacemakerd: info: crm_get_peer:Node 739512332 has uuid
>> 739512332
>> 
>> Seems UUID and node ID is mixed up in the message at least...
> 
> "UUID" is a misnomer, for historical reasons. It was an actual UUID for
> heartbeat (originally the only supported cluster layer), but for
> corosync it's the node ID and for Pacemaker Remote nodes it's the node
> name.
> 
> Ironically the string after "Created entry" is an actual UUID but
> that's not the "node UUID", just an internal hash table id.
> 
> We should definitely update all those messages to reflect the current
> reality.

;-) +1 ("This cat's a dog for historical reasons")

[...]

>> cib:   notice: crm_update_peer_state_iter:  Node (null) state is
>> now member | nodeid=739512332 previous=unknown
>> source=crm_update_peer_proc
>> ...
>> 
>> This doesn't look right in my eyes.
> 
> Corosync by default provides only a corosync node ID when identifying
> nodes. The daemons have to learn the node names from cluster messages
> passed around by pacemaker. The exception is if "name:" is specified in
> corosync.conf, the daemons can learn the names at start‑up.

Can't there be a CIB event like "node ID is available now", and all the
clients needing a node ID silently wait until it is available instead of
creating all that noise?

> 
> As for the "now online"/"now member", there are two stages of corosync
> membership: cluster membership (i.e. participating in the corosync
> token ring) and process group (CPG) membership (which is corosync's
> node‑to‑node messaging protocol). They generally happen very close to
> each other.


I thought CPG is somewhat a subset of the cluster.

[...]
>> I feel this mess with determining the node name is overly
>> complicated...
>> 
>> Regards,
>> Ulrich
> 
> Complicated, yes ‑‑ overly, depends on your point of view :)
> 
> Putting "name:" in corosync.conf simplifies things.

Also see my earlier message. If adding the node name to corosync conf is
highly recommended, I wonder why SUSE's SLES procedure does not set it...

Thanks for your insights!

Regards,
Ulrich

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/