Re: [ClusterLabs] crm resource trace
On Fri, 2022-10-21 at 13:05 +0200, Lentes, Bernd wrote: > - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com > wrote: > > > This turned out to be interesting. > > > > In the first case, the resource history contains a start action and > > a > > recurring monitor. The parameters to both change, so the resource > > requires a restart. > > > > In the second case, the resource's history was apparently cleaned > > at > > some point, so the cluster re-probed it and found it running. That > > means its history contained only the probe and the recurring > > monitor. > > Neither probe nor recurring monitor changes require a restart, so > > nothing is done. > > > > It would probably make sense to distinguish between probes that > > found > > the resource running and probes that found it not running. > > Parameter > > changes in the former should probably be treated like start. > > > > Is that now a bug or by design ? It was by design, though that aspect of it was questionable. > And what is the conclusion of it all ? >From the rest of the thread, I suspect that this has been fixed in a later version, though I'm not sure which changes were relevant. A lot of work has been done on the digest code in the past couple years. > Do a "crm resource cleanup" before each "crm resource [un]trace" ? > And test everything with ptest before commit ? > > Bernd -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, 2022-10-24 at 11:22 +0200, Klaus Wenninger wrote: > > > On Mon, Oct 24, 2022 at 11:10 AM Xin Liang via Users < > users@clusterlabs.org> wrote: > > Hi Bernd, > > > > The behaviors between the SLE15SP4 and SLE12SP5 are different. > > > > On 12sp5: > > run `crm_resource --cleanup --resource `, then the resource > > is not restarted when trace/untrace > > On 15sp4: > > run `crm_resource --cleanup --resource `, then the resource > > still restarted when trace/untrace > > > > Hmm ... thanks for the update! > I do remember having reviewed some PRs dealing with digest but > obviously not detailed enough to tell if some upstream change > might have 'fixed' the issue. > Maybe Ken can still tell from the top of his mind. There were some very recent changes by Gao Yan just released with 2.1.5-rc1, to ensure resources are restarted if a parameter changes that was specified on an operation rather than the resource itself. SUSE may have backported that already. > > Klaus > > From: Users on behalf of Lentes, > > Bernd > > Sent: Monday, October 24, 2022 4:46 PM > > To: Pacemaker ML > > Subject: Re: [ClusterLabs] crm resource trace > > > > > > - On 24 Oct, 2022, at 10:08, Klaus Wenninger > > kwenn...@redhat.com wrote: > > > > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > > > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote: > > > > > > > > > Did you try a cleanup in between? > > > > When i do a cleanup before trace/untrace the resource is not > > restarted. > > When i don't do a cleanup it is restarted. > > > > Bernd -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, Oct 24, 2022 at 11:10 AM Xin Liang via Users wrote: > Hi Bernd, > > The behaviors between the SLE15SP4 and SLE12SP5 are different. > > On 12sp5: > >- run `crm_resource --cleanup --resource `, then the resource >is not restarted when trace/untrace > > On 15sp4: > >- run `crm_resource --cleanup --resource `, then the resource >still restarted when trace/untrace > > Hmm ... thanks for the update! I do remember having reviewed some PRs dealing with digest but obviously not detailed enough to tell if some upstream change might have 'fixed' the issue. Maybe Ken can still tell from the top of his mind. Klaus > >- > > -- > *From:* Users on behalf of Lentes, Bernd < > bernd.len...@helmholtz-muenchen.de> > *Sent:* Monday, October 24, 2022 4:46 PM > *To:* Pacemaker ML > *Subject:* Re: [ClusterLabs] crm resource trace > > > - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com > wrote: > > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > > mailto:users@clusterlabs.org | > users@clusterlabs.org ] > wrote: > > > > > Did you try a cleanup in between? > > When i do a cleanup before trace/untrace the resource is not restarted. > When i don't do a cleanup it is restarted. > > Bernd > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
Hi Bernd, The behaviors between the SLE15SP4 and SLE12SP5 are different. On 12sp5: * run `crm_resource --cleanup --resource `, then the resource is not restarted when trace/untrace On 15sp4: * run `crm_resource --cleanup --resource `, then the resource still restarted when trace/untrace * From: Users on behalf of Lentes, Bernd Sent: Monday, October 24, 2022 4:46 PM To: Pacemaker ML Subject: Re: [ClusterLabs] crm resource trace - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote: > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote: > Did you try a cleanup in between? When i do a cleanup before trace/untrace the resource is not restarted. When i don't do a cleanup it is restarted. Bernd ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, Oct 24, 2022 at 10:46 AM Lentes, Bernd < bernd.len...@helmholtz-muenchen.de> wrote: > > - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com > wrote: > > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote: > > > > > Did you try a cleanup in between? > > When i do a cleanup before trace/untrace the resource is not restarted. > When i don't do a cleanup it is restarted. > Sry Bernd for not being explicit - did get it that far ;-) Wanted to see if Xin Liang has tried the cleanup as well. Klaus > > Bernd___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
- On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote: > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [ > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote: > Did you try a cleanup in between? When i do a cleanup before trace/untrace the resource is not restarted. When i don't do a cleanup it is restarted. Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users wrote: > Hi Bernd, > > I got it, you are on SLE12SP5, and the crmsh version > is crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right? > > I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip, > run `crm resource trace vip` and `crm resource untrace vip` > > On each time, the resource vip will be restarted("due to resource > definition change") > Did you try a cleanup in between? Klaus > > I can't see the resource don't restart when trace/untrace resource > > > Regards, > Xin > > > -- > *From:* Users on behalf of Xin Liang via > Users > *Sent:* Monday, October 24, 2022 10:29 AM > *To:* Cluster Labs - All topics related to open-source clustering > welcomed > *Cc:* Xin Liang > *Subject:* Re: [ClusterLabs] crm resource trace > > Hi Bernd, > > On which version you're running for crmsh and SLE? > > > Regards, > Xin > -- > *From:* Users on behalf of Lentes, Bernd < > bernd.len...@helmholtz-muenchen.de> > *Sent:* Monday, October 17, 2022 6:43 PM > *To:* Pacemaker ML > *Subject:* Re: [ClusterLabs] crm resource trace > > Hi, > > i try to find out why there is sometimes a restart of the resource and > sometimes not. > Unpredictable behaviour is someting i expect from Windows, not from Linux. > Here you see two "crm resource trace "resource"". > In the first case the resource is restarted , in the second not. > The command i used is identical in both cases. > > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap > Fri Oct 14 19:05:51 CEST 2022 > INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/ > INFO: Trace set, restart vm-genetrap to trace non-monitor operations > > > == > > 1st try: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > Diff: --- 7.28974.3 2 > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: + > /cib: @epoch=28975, @num_updates=0 > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ > /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: > ++ > > Oct 14 19:05:52 [26001] ha-idg-1 crmd: info: > abort_transition_graph: Transition 791 aborted by > instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': > Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 > path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30'] > complete=true > Oct 14 19:05:52 [26001] ha-idg-1 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_process_request:
Re: [ClusterLabs] crm resource trace
Hi Bernd, I got it, you are on SLE12SP5, and the crmsh version is crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right? I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip, run `crm resource trace vip` and `crm resource untrace vip` On each time, the resource vip will be restarted("due to resource definition change") I can't see the resource don't restart when trace/untrace resource Regards, Xin From: Users on behalf of Xin Liang via Users Sent: Monday, October 24, 2022 10:29 AM To: Cluster Labs - All topics related to open-source clustering welcomed Cc: Xin Liang Subject: Re: [ClusterLabs] crm resource trace Hi Bernd, On which version you're running for crmsh and SLE? Regards, Xin From: Users on behalf of Lentes, Bernd Sent: Monday, October 17, 2022 6:43 PM To: Pacemaker ML Subject: Re: [ClusterLabs] crm resource trace Hi, i try to find out why there is sometimes a restart of the resource and sometimes not. Unpredictable behaviour is someting i expect from Windows, not from Linux. Here you see two "crm resource trace "resource"". In the first case the resource is restarted , in the second not. The command i used is identical in both cases. ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap Fri Oct 14 19:05:51 CEST 2022 INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/ INFO: Trace set, restart vm-genetrap to trace non-monitor operations == 1st try: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: --- 7.28974.3 2 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: + /cib: @epoch=28975, @num_updates=0 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [26001] ha-idg-1 crmd: info: abort_transition_graph: Transition 791 aborted by instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 path=/cib/configuration/resources/primitive
Re: [ClusterLabs] crm resource trace
Hi Bernd, On which version you're running for crmsh and SLE? Regards, Xin From: Users on behalf of Lentes, Bernd Sent: Monday, October 17, 2022 6:43 PM To: Pacemaker ML Subject: Re: [ClusterLabs] crm resource trace Hi, i try to find out why there is sometimes a restart of the resource and sometimes not. Unpredictable behaviour is someting i expect from Windows, not from Linux. Here you see two "crm resource trace "resource"". In the first case the resource is restarted , in the second not. The command i used is identical in both cases. ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap Fri Oct 14 19:05:51 CEST 2022 INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/ INFO: Trace set, restart vm-genetrap to trace non-monitor operations == 1st try: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: --- 7.28974.3 2 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: + /cib: @epoch=28975, @num_updates=0 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [26001] ha-idg-1 crmd: info: abort_transition_graph: Transition 791 aborted by instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30'] complete=true Oct 14 19:05:52 [26001] ha-idg-1 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_process_request: Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=ha-idg-2/cibadmin/2, version=7.28975.0) Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: create op[@id='vm-genetrap-monitor-30'] Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: cib_devices
Re: [ClusterLabs] crm resource trace
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's history was apparently cleaned at > some point, so the cluster re-probed it and found it running. That > means its history contained only the probe and the recurring monitor. > Neither probe nor recurring monitor changes require a restart, so > nothing is done. > > It would probably make sense to distinguish between probes that found > the resource running and probes that found it not running. Parameter > changes in the former should probably be treated like start. > Is that now a bug or by design ? And what is the conclusion of it all ? Do a "crm resource cleanup" before each "crm resource [un]trace" ? And test everything with ptest before commit ? Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Tue, 2022-10-18 at 20:48 +0200, Lentes, Bernd wrote: > - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com > wrote: > > > This turned out to be interesting. > > > > In the first case, the resource history contains a start action and > > a > > recurring monitor. The parameters to both change, so the resource > > requires a restart. > > > > In the second case, the resource's history was apparently cleaned > > at > > some point, so the cluster re-probed it and found it running. That > > means its history contained only the probe and the recurring > > monitor. > > Neither probe nor recurring monitor changes require a restart, so > > nothing is done. > > "vm-genetrap_monitor_0". Is that a probe ? > > Bernd Yes -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's history was apparently cleaned at > some point, so the cluster re-probed it and found it running. That > means its history contained only the probe and the recurring monitor. > Neither probe nor recurring monitor changes require a restart, so > nothing is done. "vm-genetrap_monitor_0". Is that a probe ? Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's history was apparently cleaned at > some point, so the cluster re-probed it and found it running. That > means its history contained only the probe and the recurring monitor. > Neither probe nor recurring monitor changes require a restart, so > nothing is done. > > It would probably make sense to distinguish between probes that found > the resource running and probes that found it not running. Parameter > changes in the former should probably be treated like start. > Which leaves the non trivial task to the RA to determine during a probe if a resource is not just running or stopped but has been started with exactly those parameters - right? May be easy for some RAs and a real issue for others. Not talking of which RA has it implemented like that already. Error code would be generic-error to trigger a stop-start - right? If I'm getting it right even without that in depth checking on probe it would have worked in this case as the probe happens before parameter change. Klaus > > On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote: > > Hi, > > > > i try to find out why there is sometimes a restart of the resource > > and sometimes not. > > Unpredictable behaviour is someting i expect from Windows, not from > > Linux. > > Here you see two "crm resource trace "resource"". > > In the first case the resource is restarted , in the second not. > > The command i used is identical in both cases. > > > > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap > > Fri Oct 14 19:05:51 CEST 2022 > > INFO: Trace for vm-genetrap is written to > > /var/lib/heartbeat/trace_ra/ > > INFO: Trace set, restart vm-genetrap to trace non-monitor operations > > > > = > > = > > > > 1st try: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: Diff: --- 7.28974.3 2 > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: + /cib: @epoch=28975, @num_updates=0 > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-monitor- > > 30']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-monito > > r-30-instance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > ributes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-stop- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins > > tance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > tes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-start- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > >> name="trace_ra" value="1" id="vm-genetrap-start-0-i > > nstance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > utes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-migrate_from- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-mi > > grate_from-0-instance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > _attributes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-migrate_to- > > 0']: > > Oct 14 19:05:52
Re: [ClusterLabs] crm resource trace
This turned out to be interesting. In the first case, the resource history contains a start action and a recurring monitor. The parameters to both change, so the resource requires a restart. In the second case, the resource's history was apparently cleaned at some point, so the cluster re-probed it and found it running. That means its history contained only the probe and the recurring monitor. Neither probe nor recurring monitor changes require a restart, so nothing is done. It would probably make sense to distinguish between probes that found the resource running and probes that found it not running. Parameter changes in the former should probably be treated like start. On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote: > Hi, > > i try to find out why there is sometimes a restart of the resource > and sometimes not. > Unpredictable behaviour is someting i expect from Windows, not from > Linux. > Here you see two "crm resource trace "resource"". > In the first case the resource is restarted , in the second not. > The command i used is identical in both cases. > > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap > Fri Oct 14 19:05:51 CEST 2022 > INFO: Trace for vm-genetrap is written to > /var/lib/heartbeat/trace_ra/ > INFO: Trace set, restart vm-genetrap to trace non-monitor operations > > = > = > > 1st try: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: Diff: --- 7.28974.3 2 > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: + /cib: @epoch=28975, @num_updates=0 > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > genetrap']/operations/op[@id='vm-genetrap-monitor- > 30']: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > name="trace_ra" value="1" id="vm-genetrap-monito > r-30-instance_attributes-trace_ra"/> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > ributes> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > genetrap']/operations/op[@id='vm-genetrap-stop- > 0']: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins > tance_attributes-trace_ra"/> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > tes> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > genetrap']/operations/op[@id='vm-genetrap-start- > 0']: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ >name="trace_ra" value="1" id="vm-genetrap-start-0-i > nstance_attributes-trace_ra"/> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > utes> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > genetrap']/operations/op[@id='vm-genetrap-migrate_from- > 0']: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > name="trace_ra" value="1" id="vm-genetrap-mi > grate_from-0-instance_attributes-trace_ra"/> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > _attributes> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > genetrap']/operations/op[@id='vm-genetrap-migrate_to- > 0']: > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++ > name="trace_ra" value="1" id="vm-genetrap-migr > ate_to-0-instance_attributes-trace_ra"/> > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > cib_perform_op: ++
Re: [ClusterLabs] crm resource trace
On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote: The section you highlighted does contain the key difference: > Oct 14 19:05:52 [26000] ha-idg-1pengine: info: > rsc_action_digest_cmp: Parameters to vm-genetrap_start_0 on ha-idg- > 1 changed: was e2eeb4e5d1604535fabae9ce5407d685 vs. now > 516b745764a83d26e0d73daf2c65ca38 (reload:3.0.14) > 0:0;82:692:0:167bea02-e39a-4fbc-a09f-3ba4d704c4f9 vs > Oct 14 19:26:33 [26000] ha-idg-1pengine: info: > rsc_action_digest_cmp: Parameters to vm-genetrap_monitor_3 on > ha-idg-1 changed: was 2c5e72e3ebb855036a484cb7e2823f92 vs. now > d81c72a6c99d1a5c2defaa830fb82b23 (reschedule:3.0.14) > 0:0;28:797:0:167bea02-e39a-4fbc-a09f-3ba4d704c4f9 In the first case, Pacemaker detected that the start action parameters changed, but in the other case, it only detected that the recurring monitor changed. Recurring monitors can be changed without requiring a full restart. I'm not sure why the start change wasn't detected in the second case. Immediately after the log messages you showed for each case, there should be a "saving inputs in " message. If you can privately email me those two files, I can try to figure out what happened. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
Hi, i try to find out why there is sometimes a restart of the resource and sometimes not. Unpredictable behaviour is someting i expect from Windows, not from Linux. Here you see two "crm resource trace "resource"". In the first case the resource is restarted , in the second not. The command i used is identical in both cases. ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap Fri Oct 14 19:05:51 CEST 2022 INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/ INFO: Trace set, restart vm-genetrap to trace non-monitor operations == 1st try: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: --- 7.28974.3 2 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: + /cib: @epoch=28975, @num_updates=0 Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']: Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op: ++ Oct 14 19:05:52 [26001] ha-idg-1 crmd: info: abort_transition_graph: Transition 791 aborted by instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30'] complete=true Oct 14 19:05:52 [26001] ha-idg-1 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_process_request: Completed cib_apply_diff operation for section 'all': OK (rc=0, origin=ha-idg-2/cibadmin/2, version=7.28975.0) Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: update_cib_stonith_devices_v2: Updating device list from the cib: create op[@id='vm-genetrap-monitor-30'] Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: cib_devices_update: Updating devices to version 7.28975.0 Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_file_backup: Archived previous version as
Re: [ClusterLabs] crm resource trace (Was: Re: trace of resource - sometimes restart, sometimes not)
- On 7 Oct, 2022, at 21:37, Reid Wahl nw...@redhat.com wrote: > On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd > wrote: >> - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote: >> >> > How did you set a trace just for monitor? >> >> crm resource trace dlm monitor. > > crm resource trace adds "trace_ra=1" to the end of the > monitor operation: > https://github.com/ClusterLabs/crmsh/blob/8cf6a9d13af6496fdd384c18c54680ceb354b72d/crmsh/ui_resource.py#L638-L646 > > That's a schema violation and pcs doesn't even allow it. I installed > `crmsh` and tried to reproduce... `trace_ra=1` shows up in the > configuration for the monitor operation but it gets ignored. I don't > get *any* trace logs. That makes sense -- ocf-shellfuncs.in enables > tracing only if OCF_RESKEY_trace_ra is true. Pacemaker doesn't add > operation attribute to the OCF_RESKEY_* environment variables... at > least in the current upstream main. > > Apparently (since you got logs) this works in some way, or worked at > some point in the past. Out of curiosity, what version are you on? > SLES 12 SP5: ha-idg-1:/usr/lib/ocf/resource.d/heartbeat # rpm -qa|grep -iE 'pacemaker|corosync' libpacemaker3-1.1.24+20210811.f5abda0ee-3.21.9.x86_64 corosync-2.3.6-9.22.1.x86_64 pacemaker-debugsource-1.1.23+20200622.28dd98fad-3.9.2.20591.0.PTF.1177212.x86_64 libcorosync4-2.3.6-9.22.1.x86_64 pacemaker-cli-1.1.24+20210811.f5abda0ee-3.21.9.x86_64 pacemaker-cts-1.1.24+20210811.f5abda0ee-3.21.9.x86_64 pacemaker-1.1.24+20210811.f5abda0ee-3.21.9.x86_64 Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] crm resource trace (Was: Re: trace of resource - sometimes restart, sometimes not)
On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd wrote: > > > > - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote: > > > How did you set a trace just for monitor? > > crm resource trace dlm monitor. crm resource trace adds "trace_ra=1" to the end of the monitor operation: https://github.com/ClusterLabs/crmsh/blob/8cf6a9d13af6496fdd384c18c54680ceb354b72d/crmsh/ui_resource.py#L638-L646 That's a schema violation and pcs doesn't even allow it. I installed `crmsh` and tried to reproduce... `trace_ra=1` shows up in the configuration for the monitor operation but it gets ignored. I don't get *any* trace logs. That makes sense -- ocf-shellfuncs.in enables tracing only if OCF_RESKEY_trace_ra is true. Pacemaker doesn't add operation attribute to the OCF_RESKEY_* environment variables... at least in the current upstream main. Apparently (since you got logs) this works in some way, or worked at some point in the past. Out of curiosity, what version are you on? > > > Wish I could help with that -- it's mostly a mystery to me too ;) > > :-)) -- Regards, Reid Wahl (He/Him) Senior Software Engineer, Red Hat RHEL High Availability - Pacemaker ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/