Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Ken Gaillot
On Fri, 2022-10-21 at 13:05 +0200, Lentes, Bernd wrote:
> - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com
> wrote:
> 
> > This turned out to be interesting.
> > 
> > In the first case, the resource history contains a start action and
> > a
> > recurring monitor. The parameters to both change, so the resource
> > requires a restart.
> > 
> > In the second case, the resource's history was apparently cleaned
> > at
> > some point, so the cluster re-probed it and found it running. That
> > means its history contained only the probe and the recurring
> > monitor.
> > Neither probe nor recurring monitor changes require a restart, so
> > nothing is done.
> > 
> > It would probably make sense to distinguish between probes that
> > found
> > the resource running and probes that found it not running.
> > Parameter
> > changes in the former should probably be treated like start.
> > 
> 
> Is that now a bug or by design ?

It was by design, though that aspect of it was questionable.

> And what is the conclusion of it all ?

>From the rest of the thread, I suspect that this has been fixed in a
later version, though I'm not sure which changes were relevant. A lot
of work has been done on the digest code in the past couple years.

> Do a "crm resource cleanup" before each "crm resource [un]trace" ?
> And test everything with ptest before commit ?
> 
> Bernd
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Ken Gaillot
On Mon, 2022-10-24 at 11:22 +0200, Klaus Wenninger wrote:
> 
> 
> On Mon, Oct 24, 2022 at 11:10 AM Xin Liang via Users <
> users@clusterlabs.org> wrote:
> > Hi Bernd,
> > 
> > The behaviors between the SLE15SP4 and SLE12SP5 are different.
> > 
> > On 12sp5:
> > run `crm_resource --cleanup --resource `, then the resource
> > is not restarted when trace/untrace
> > On 15sp4:
> > run `crm_resource --cleanup --resource `, then the resource
> > still restarted when trace/untrace
> > 
> 
> Hmm ... thanks for the update!
> I do remember having reviewed some PRs dealing with digest but
> obviously not detailed enough to tell if some upstream change
> might have 'fixed' the issue.
> Maybe Ken can still tell from the top of his mind.

There were some very recent changes by Gao Yan just released with
2.1.5-rc1, to ensure resources are restarted if a parameter changes
that was specified on an operation rather than the resource itself.
SUSE may have backported that already.

> 
> Klaus 
> > From: Users  on behalf of Lentes,
> > Bernd 
> > Sent: Monday, October 24, 2022 4:46 PM
> > To: Pacemaker ML 
> > Subject: Re: [ClusterLabs] crm resource trace
> >  
> > 
> > - On 24 Oct, 2022, at 10:08, Klaus Wenninger 
> > kwenn...@redhat.com wrote:
> > 
> > > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> > > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:
> > 
> > 
> > 
> > > Did you try a cleanup in between?
> > 
> > When i do a cleanup before trace/untrace the resource is not
> > restarted.
> > When i don't do a cleanup it is restarted.
> > 
> > Bernd
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
On Mon, Oct 24, 2022 at 11:10 AM Xin Liang via Users 
wrote:

> Hi Bernd,
>
> The behaviors between the SLE15SP4 and SLE12SP5 are different.
>
> On 12sp5:
>
>- run `crm_resource --cleanup --resource `, then the resource
>is not restarted when trace/untrace
>
> On 15sp4:
>
>- run `crm_resource --cleanup --resource `, then the resource
>still restarted when trace/untrace
>
>
Hmm ... thanks for the update!
I do remember having reviewed some PRs dealing with digest but
obviously not detailed enough to tell if some upstream change
might have 'fixed' the issue.
Maybe Ken can still tell from the top of his mind.

Klaus

>
>-
>
> --
> *From:* Users  on behalf of Lentes, Bernd <
> bernd.len...@helmholtz-muenchen.de>
> *Sent:* Monday, October 24, 2022 4:46 PM
> *To:* Pacemaker ML 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
>
> - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com
> wrote:
>
> > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> > mailto:users@clusterlabs.org  |
> users@clusterlabs.org ] > wrote:
>
>
>
> > Did you try a cleanup in between?
>
> When i do a cleanup before trace/untrace the resource is not restarted.
> When i don't do a cleanup it is restarted.
>
> Bernd
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Xin Liang via Users
Hi Bernd,

The behaviors between the SLE15SP4 and SLE12SP5 are different.

On 12sp5:

  *   run `crm_resource --cleanup --resource `, then the resource is not 
restarted when trace/untrace

On 15sp4:

  *   run `crm_resource --cleanup --resource `, then the resource still 
restarted when trace/untrace
  *


From: Users  on behalf of Lentes, Bernd 

Sent: Monday, October 24, 2022 4:46 PM
To: Pacemaker ML 
Subject: Re: [ClusterLabs] crm resource trace


- On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote:

> On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:



> Did you try a cleanup in between?

When i do a cleanup before trace/untrace the resource is not restarted.
When i don't do a cleanup it is restarted.

Bernd
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
On Mon, Oct 24, 2022 at 10:46 AM Lentes, Bernd <
bernd.len...@helmholtz-muenchen.de> wrote:

>
> - On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com
> wrote:
>
> > On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> > mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:
>
>
>
> > Did you try a cleanup in between?
>
> When i do a cleanup before trace/untrace the resource is not restarted.
> When i don't do a cleanup it is restarted.
>

Sry Bernd for not being explicit - did get it that far ;-)
Wanted to see if Xin Liang has tried the cleanup as well.

Klaus

>
> Bernd___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Lentes, Bernd

- On 24 Oct, 2022, at 10:08, Klaus Wenninger kwenn...@redhat.com wrote:

> On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users < [
> mailto:users@clusterlabs.org | users@clusterlabs.org ] > wrote:



> Did you try a cleanup in between?

When i do a cleanup before trace/untrace the resource is not restarted.
When i don't do a cleanup it is restarted.

Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Klaus Wenninger
On Mon, Oct 24, 2022 at 9:50 AM Xin Liang via Users 
wrote:

> Hi Bernd,
>
> I got it, you are on SLE12SP5, and the crmsh version
> is crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right?
>
> I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip,
> run `crm resource trace vip` and `crm resource untrace vip`
>
> On each time, the resource vip will be restarted("due to resource
> definition change")
>

Did you try a cleanup in between?

Klaus

>
> I can't see the resource don't restart when trace/untrace resource
>
>
> Regards,
> Xin
>
>
> --
> *From:* Users  on behalf of Xin Liang via
> Users 
> *Sent:* Monday, October 24, 2022 10:29 AM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Cc:* Xin Liang 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
> Hi Bernd,
>
> On which version you're running for crmsh and SLE?
>
>
> Regards,
> Xin
> --
> *From:* Users  on behalf of Lentes, Bernd <
> bernd.len...@helmholtz-muenchen.de>
> *Sent:* Monday, October 17, 2022 6:43 PM
> *To:* Pacemaker ML 
> *Subject:* Re: [ClusterLabs] crm resource trace
>
> Hi,
>
> i try to find out why there is sometimes a restart of the resource and
> sometimes not.
> Unpredictable behaviour is someting i expect from Windows, not from Linux.
> Here you see two "crm resource trace "resource"".
> In the first case the resource is restarted , in the second not.
> The command i used is identical in both cases.
>
> ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> Fri Oct 14 19:05:51 CEST 2022
> INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/
> INFO: Trace set, restart vm-genetrap to trace non-monitor operations
>
>
> ==
>
> 1st try:
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> Diff: --- 7.28974.3 2
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  +
> /cib:  @epoch=28975, @num_updates=0
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++
> /cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']:
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:
> ++
> 
> Oct 14 19:05:52 [26001] ha-idg-1   crmd: info:
> abort_transition_graph:  Transition 791 aborted by
> instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create':
> Configuration change | cib=7.28975.0 source=te_update_diff_v2:483
> path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']
> complete=true
> Oct 14 19:05:52 [26001] ha-idg-1   crmd:   notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_process_request:   

Re: [ClusterLabs] crm resource trace

2022-10-24 Thread Xin Liang via Users
Hi Bernd,

I got it, you are on SLE12SP5, and the crmsh version is 
crmsh-4.1.1+git.1647830282.d380378a-2.74.2.noarch, right?

I try to reproduce this inconsistent behavior, add an IPaddr2 agent vip, run 
`crm resource trace vip` and `crm resource untrace vip`

On each time, the resource vip will be restarted("due to resource definition 
change")

I can't see the resource don't restart when trace/untrace resource


Regards,
Xin



From: Users  on behalf of Xin Liang via Users 

Sent: Monday, October 24, 2022 10:29 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Cc: Xin Liang 
Subject: Re: [ClusterLabs] crm resource trace

Hi Bernd,

On which version you're running for crmsh and SLE?


Regards,
Xin

From: Users  on behalf of Lentes, Bernd 

Sent: Monday, October 17, 2022 6:43 PM
To: Pacemaker ML 
Subject: Re: [ClusterLabs] crm resource trace

Hi,

i try to find out why there is sometimes a restart of the resource and 
sometimes not.
Unpredictable behaviour is someting i expect from Windows, not from Linux.
Here you see two "crm resource trace "resource"".
In the first case the resource is restarted , in the second not.
The command i used is identical in both cases.

ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
Fri Oct 14 19:05:51 CEST 2022
INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/
INFO: Trace set, restart vm-genetrap to trace non-monitor operations

==

1st try:
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
--- 7.28974.3 2
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
+++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  +  
/cib:  @epoch=28975, @num_updates=0
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  


Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [26001] ha-idg-1   crmd: info: abort_transition_graph:  
Transition 791 aborted by 
instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': 
Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 
path=/cib/configuration/resources/primitive

Re: [ClusterLabs] crm resource trace

2022-10-23 Thread Xin Liang via Users
Hi Bernd,

On which version you're running for crmsh and SLE?


Regards,
Xin

From: Users  on behalf of Lentes, Bernd 

Sent: Monday, October 17, 2022 6:43 PM
To: Pacemaker ML 
Subject: Re: [ClusterLabs] crm resource trace

Hi,

i try to find out why there is sometimes a restart of the resource and 
sometimes not.
Unpredictable behaviour is someting i expect from Windows, not from Linux.
Here you see two "crm resource trace "resource"".
In the first case the resource is restarted , in the second not.
The command i used is identical in both cases.

ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
Fri Oct 14 19:05:51 CEST 2022
INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/
INFO: Trace set, restart vm-genetrap to trace non-monitor operations

==

1st try:
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
--- 7.28974.3 2
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
+++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  +  
/cib:  @epoch=28975, @num_updates=0
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  


Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [26001] ha-idg-1   crmd: info: abort_transition_graph:  
Transition 791 aborted by 
instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': 
Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 
path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']
 complete=true
Oct 14 19:05:52 [26001] ha-idg-1   crmd:   notice: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_process_request: 
Completed cib_apply_diff operation for section 'all': OK (rc=0, 
origin=ha-idg-2/cibadmin/2, version=7.28975.0)
Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: 
update_cib_stonith_devices_v2:   Updating device list from the cib: create 
op[@id='vm-genetrap-monitor-30']
Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: cib_devices

Re: [ClusterLabs] crm resource trace

2022-10-21 Thread Lentes, Bernd

- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote:

> This turned out to be interesting.
> 
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
> 
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.
> 
> It would probably make sense to distinguish between probes that found
> the resource running and probes that found it not running. Parameter
> changes in the former should probably be treated like start.
> 

Is that now a bug or by design ?
And what is the conclusion of it all ?
Do a "crm resource cleanup" before each "crm resource [un]trace" ?
And test everything with ptest before commit ?

Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Ken Gaillot
On Tue, 2022-10-18 at 20:48 +0200, Lentes, Bernd wrote:
> - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com
> wrote:
> 
> > This turned out to be interesting.
> > 
> > In the first case, the resource history contains a start action and
> > a
> > recurring monitor. The parameters to both change, so the resource
> > requires a restart.
> > 
> > In the second case, the resource's history was apparently cleaned
> > at
> > some point, so the cluster re-probed it and found it running. That
> > means its history contained only the probe and the recurring
> > monitor.
> > Neither probe nor recurring monitor changes require a restart, so
> > nothing is done.
> 
> "vm-genetrap_monitor_0". Is that a probe ?
> 
> Bernd

Yes
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Lentes, Bernd

- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote:

> This turned out to be interesting.
> 
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
> 
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.

"vm-genetrap_monitor_0". Is that a probe ?

Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Klaus Wenninger
On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot  wrote:

> This turned out to be interesting.
>
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
>
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.
>
> It would probably make sense to distinguish between probes that found
> the resource running and probes that found it not running. Parameter
> changes in the former should probably be treated like start.
>

Which leaves the non trivial task to the RA to determine during a probe
if a resource is not just running or stopped but has been started with
exactly those parameters - right? May be easy for some RAs and
a real issue for others. Not talking of which RA has it implemented
like that already. Error code would be generic-error to trigger a stop-start
- right?
If I'm getting it right even without that in depth checking on probe it
would have worked in this case as the probe happens before parameter
change.

Klaus

>
> On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote:
> > Hi,
> >
> > i try to find out why there is sometimes a restart of the resource
> > and sometimes not.
> > Unpredictable behaviour is someting i expect from Windows, not from
> > Linux.
> > Here you see two "crm resource trace "resource"".
> > In the first case the resource is restarted , in the second not.
> > The command i used is identical in both cases.
> >
> > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> > Fri Oct 14 19:05:51 CEST 2022
> > INFO: Trace for vm-genetrap is written to
> > /var/lib/heartbeat/trace_ra/
> > INFO: Trace set, restart vm-genetrap to trace non-monitor operations
> >
> > =
> > =
> >
> > 1st try:
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: --- 7.28974.3 2
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  +  /cib:  @epoch=28975, @num_updates=0
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-monitor-
> > 30']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-monito
> > r-30-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > ributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-stop-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins
> > tance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > tes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-start-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >> name="trace_ra" value="1" id="vm-genetrap-start-0-i
> > nstance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >  > utes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_from-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-mi
> > grate_from-0-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > _attributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_to-
> > 0']:  
> > Oct 14 19:05:52 

Re: [ClusterLabs] crm resource trace

2022-10-17 Thread Ken Gaillot
This turned out to be interesting.

In the first case, the resource history contains a start action and a
recurring monitor. The parameters to both change, so the resource
requires a restart.

In the second case, the resource's history was apparently cleaned at
some point, so the cluster re-probed it and found it running. That
means its history contained only the probe and the recurring monitor.
Neither probe nor recurring monitor changes require a restart, so
nothing is done.

It would probably make sense to distinguish between probes that found
the resource running and probes that found it not running. Parameter
changes in the former should probably be treated like start.

On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote:
> Hi,
> 
> i try to find out why there is sometimes a restart of the resource
> and sometimes not.
> Unpredictable behaviour is someting i expect from Windows, not from
> Linux.
> Here you see two "crm resource trace "resource"".
> In the first case the resource is restarted , in the second not.
> The command i used is identical in both cases.
> 
> ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> Fri Oct 14 19:05:51 CEST 2022
> INFO: Trace for vm-genetrap is written to
> /var/lib/heartbeat/trace_ra/
> INFO: Trace set, restart vm-genetrap to trace non-monitor operations
> 
> =
> =
> 
> 1st try:
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  Diff: --- 7.28974.3 2
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  +  /cib:  @epoch=28975, @num_updates=0
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> genetrap']/operations/op[@id='vm-genetrap-monitor-
> 30']:  
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
>   name="trace_ra" value="1" id="vm-genetrap-monito
> r-30-instance_attributes-trace_ra"/>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
> ributes>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> genetrap']/operations/op[@id='vm-genetrap-stop-
> 0']:  
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
>   name="trace_ra" value="1" id="vm-genetrap-stop-0-ins
> tance_attributes-trace_ra"/>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
> tes>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> genetrap']/operations/op[@id='vm-genetrap-start-
> 0']:  
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
>name="trace_ra" value="1" id="vm-genetrap-start-0-i
> nstance_attributes-trace_ra"/>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
>  utes>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> genetrap']/operations/op[@id='vm-genetrap-migrate_from-
> 0']:  
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
>   name="trace_ra" value="1" id="vm-genetrap-mi
> grate_from-0-instance_attributes-trace_ra"/>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
> _attributes>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> genetrap']/operations/op[@id='vm-genetrap-migrate_to-
> 0']:  
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++  
> name="trace_ra" value="1" id="vm-genetrap-migr
> ate_to-0-instance_attributes-trace_ra"/>
> Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> cib_perform_op:  ++   

Re: [ClusterLabs] crm resource trace

2022-10-17 Thread Ken Gaillot
On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote:

The section you highlighted does contain the key difference:

> Oct 14 19:05:52 [26000] ha-idg-1pengine: info:
> rsc_action_digest_cmp:   Parameters to vm-genetrap_start_0 on ha-idg-
> 1 changed: was e2eeb4e5d1604535fabae9ce5407d685 vs. now
> 516b745764a83d26e0d73daf2c65ca38 (reload:3.0.14)
> 0:0;82:692:0:167bea02-e39a-4fbc-a09f-3ba4d704c4f9

vs

> Oct 14 19:26:33 [26000] ha-idg-1pengine: info:
> rsc_action_digest_cmp:   Parameters to vm-genetrap_monitor_3 on
> ha-idg-1 changed: was 2c5e72e3ebb855036a484cb7e2823f92 vs. now
> d81c72a6c99d1a5c2defaa830fb82b23 (reschedule:3.0.14)
> 0:0;28:797:0:167bea02-e39a-4fbc-a09f-3ba4d704c4f9

In the first case, Pacemaker detected that the start action parameters
changed, but in the other case, it only detected that the recurring
monitor changed. Recurring monitors can be changed without requiring a
full restart.

I'm not sure why the start change wasn't detected in the second case.
Immediately after the log messages you showed for each case, there
should be a "saving inputs in " message. If you can privately
email me those two files, I can try to figure out what happened.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-17 Thread Lentes, Bernd
Hi,

i try to find out why there is sometimes a restart of the resource and 
sometimes not.
Unpredictable behaviour is someting i expect from Windows, not from Linux.
Here you see two "crm resource trace "resource"".
In the first case the resource is restarted , in the second not.
The command i used is identical in both cases.

ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
Fri Oct 14 19:05:51 CEST 2022
INFO: Trace for vm-genetrap is written to /var/lib/heartbeat/trace_ra/
INFO: Trace set, restart vm-genetrap to trace non-monitor operations

==

1st try:
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
--- 7.28974.3 2
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  Diff: 
+++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  +  
/cib:  @epoch=28975, @num_updates=0
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-stop-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-start-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  


Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_from-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-migrate_to-0']:
  
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

   
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_perform_op:  ++  

 
Oct 14 19:05:52 [26001] ha-idg-1   crmd: info: abort_transition_graph:  
Transition 791 aborted by 
instance_attributes.vm-genetrap-monitor-30-instance_attributes 'create': 
Configuration change | cib=7.28975.0 source=te_update_diff_v2:483 
path=/cib/configuration/resources/primitive[@id='vm-genetrap']/operations/op[@id='vm-genetrap-monitor-30']
 complete=true
Oct 14 19:05:52 [26001] ha-idg-1   crmd:   notice: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_process_request: 
Completed cib_apply_diff operation for section 'all': OK (rc=0, 
origin=ha-idg-2/cibadmin/2, version=7.28975.0)
Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: 
update_cib_stonith_devices_v2:   Updating device list from the cib: create 
op[@id='vm-genetrap-monitor-30']
Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng: info: cib_devices_update:  
Updating devices to version 7.28975.0
Oct 14 19:05:52 [25997] ha-idg-1 stonith-ng:   notice: unpack_config:   On loss 
of CCM Quorum: Ignore
Oct 14 19:05:52 [25996] ha-idg-1cib: info: cib_file_backup: 
Archived previous version as 

Re: [ClusterLabs] crm resource trace (Was: Re: trace of resource - sometimes restart, sometimes not)

2022-10-10 Thread Lentes, Bernd
- On 7 Oct, 2022, at 21:37, Reid Wahl nw...@redhat.com wrote:

> On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd
>  wrote:
>> - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote:
>>
>> > How did you set a trace just for monitor?
>>
>> crm resource trace dlm monitor.
> 
> crm resource trace   adds "trace_ra=1" to the end of the
> monitor operation:
> https://github.com/ClusterLabs/crmsh/blob/8cf6a9d13af6496fdd384c18c54680ceb354b72d/crmsh/ui_resource.py#L638-L646
> 
> That's a schema violation and pcs doesn't even allow it. I installed
> `crmsh` and tried to reproduce... `trace_ra=1` shows up in the
> configuration for the monitor operation but it gets ignored. I don't
> get *any* trace logs. That makes sense -- ocf-shellfuncs.in enables
> tracing only if OCF_RESKEY_trace_ra is true. Pacemaker doesn't add
> operation attribute to the OCF_RESKEY_* environment variables... at
> least in the current upstream main.
> 
> Apparently (since you got logs) this works in some way, or worked at
> some point in the past. Out of curiosity, what version are you on?
> 

SLES 12 SP5:
ha-idg-1:/usr/lib/ocf/resource.d/heartbeat # rpm -qa|grep -iE 
'pacemaker|corosync'

libpacemaker3-1.1.24+20210811.f5abda0ee-3.21.9.x86_64
corosync-2.3.6-9.22.1.x86_64
pacemaker-debugsource-1.1.23+20200622.28dd98fad-3.9.2.20591.0.PTF.1177212.x86_64
libcorosync4-2.3.6-9.22.1.x86_64
pacemaker-cli-1.1.24+20210811.f5abda0ee-3.21.9.x86_64
pacemaker-cts-1.1.24+20210811.f5abda0ee-3.21.9.x86_64
pacemaker-1.1.24+20210811.f5abda0ee-3.21.9.x86_64


Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] crm resource trace (Was: Re: trace of resource - sometimes restart, sometimes not)

2022-10-07 Thread Reid Wahl
On Fri, Oct 7, 2022 at 6:02 AM Lentes, Bernd
 wrote:
>
>
>
> - On 7 Oct, 2022, at 01:18, Reid Wahl nw...@redhat.com wrote:
>
> > How did you set a trace just for monitor?
>
> crm resource trace dlm monitor.

crm resource trace   adds "trace_ra=1" to the end of the
monitor operation:
https://github.com/ClusterLabs/crmsh/blob/8cf6a9d13af6496fdd384c18c54680ceb354b72d/crmsh/ui_resource.py#L638-L646

That's a schema violation and pcs doesn't even allow it. I installed
`crmsh` and tried to reproduce... `trace_ra=1` shows up in the
configuration for the monitor operation but it gets ignored. I don't
get *any* trace logs. That makes sense -- ocf-shellfuncs.in enables
tracing only if OCF_RESKEY_trace_ra is true. Pacemaker doesn't add
operation attribute to the OCF_RESKEY_* environment variables... at
least in the current upstream main.

Apparently (since you got logs) this works in some way, or worked at
some point in the past. Out of curiosity, what version are you on?

>
> > Wish I could help with that -- it's mostly a mystery to me too ;)
>
> :-))



-- 
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/