Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Ken Gaillot
On Tue, 2022-10-18 at 20:48 +0200, Lentes, Bernd wrote:
> - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com
> wrote:
> 
> > This turned out to be interesting.
> > 
> > In the first case, the resource history contains a start action and
> > a
> > recurring monitor. The parameters to both change, so the resource
> > requires a restart.
> > 
> > In the second case, the resource's history was apparently cleaned
> > at
> > some point, so the cluster re-probed it and found it running. That
> > means its history contained only the probe and the recurring
> > monitor.
> > Neither probe nor recurring monitor changes require a restart, so
> > nothing is done.
> 
> "vm-genetrap_monitor_0". Is that a probe ?
> 
> Bernd

Yes
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Lentes, Bernd

- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote:

> This turned out to be interesting.
> 
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
> 
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.

"vm-genetrap_monitor_0". Is that a probe ?

Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] trace of resource ‑ sometimes restart, sometimes not

2022-10-18 Thread Lentes, Bernd

- On 18 Oct, 2022, at 14:35, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
wrote:
> 
> # crm configure
> edit ...
> verify
> ptest nograph #!!!
> commit

That's very helpful. I didn't know that, Thanks.

> --
> If you used that, you would have seen the restart.
> Despite of that I wonder why enabling tracing to start/stop must induce a
> resource restart.
> 
> Bernd, are you sure that was the only thing changed? Do you have a record of
> commands issued?

I'm pretty sure it was the only thing.

Bernd

smime.p7s
Description: S/MIME Cryptographic Signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Re: trace of resource ‑ sometimes restart, sometimes not

2022-10-18 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 07.10.2022 um 01:08 in
Nachricht
:

[...]
> 
> trace_ra is unusual in that it's supported automatically by the OCF
> shell functions, rather than by the agents directly. That means it's
> not advertised in metadata. Otherwise agents could mark it as
> reloadable, and reload would be a quick no‑op.

That reads like "Dirty hack, dirty consequences" ;-)

[...]

Regards,
Ulrich

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] trace of resource ‑ sometimes restart, sometimes not

2022-10-18 Thread Ulrich Windl
>>> "Lentes, Bernd"  schrieb am 06.10.2022
um
21:05 in Nachricht
<280206366.19581344.1665083124300.javamail.zim...@helmholtz-muenchen.de>:
> Hi,
> 
> i have some problems with our DLM, so i wanted to trace it. Yesterday i just

> set a trace for "monitor". No restart of DLM afterwards. It went fine as 
> expected.
> I got logs in /var/lib/heartbeat/trace_ra. After some monitor i stopped 
> tracing.
> 
> Today i set a trace for all operations.
> Now resource DLM restarted:
> * Restartdlm:0   ( ha-idg-1 )   due to 
> resource definition change
> I didn't expect that so i had some trouble.

# crm configure
edit ...
verify
ptest nograph #!!!
commit

--
If you used that, you would have seen the restart.
Despite of that I wonder why enabling tracing to start/stop must induce a
resource restart.

Bernd, are you sure that was the only thing changed? Do you have a record of
commands issued?

> Is the difference in this behaviour intentional ? If yes, why ? Is there a 
> rule ?
> 
> Furthermore i'd like to ask where i can find more information about DLM, 
> because it is a mystery for me.
> Sometimes the DLM does not respond to the "monitor", so it needs to be 
> restarted, and therefore all depending resources (which is a lot).
> This happens under some load (although not completely overwhelmed).

Regards,
Ulrich

> 
> Thanks.
> 
> Bernd
> 
> -- 
> Bernd Lentes 
> System Administrator 
> Institute for Metabolism and Cell Death (MCD) 
> Building 25 - office 122 
> HelmholtzZentrum München 
> bernd.len...@helmholtz-muenchen.de 
> phone: +49 89 3187 1241
>+49 89 3187 49123 
> fax:   +49 89 3187 2294 
> https://www.helmholtz-munich.de/en/mcd 
> 
> Public key: 
> 30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff 
> 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82
fc 
> cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3
a7 
> 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92
67 
> 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53
89 
> 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2
e3 
> 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2
fa 
> 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2
67 
> 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85
08 
> d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac
58 
> f1 38 43 0e 72 af 02 03 01 00 01



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] crm resource trace

2022-10-18 Thread Klaus Wenninger
On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot  wrote:

> This turned out to be interesting.
>
> In the first case, the resource history contains a start action and a
> recurring monitor. The parameters to both change, so the resource
> requires a restart.
>
> In the second case, the resource's history was apparently cleaned at
> some point, so the cluster re-probed it and found it running. That
> means its history contained only the probe and the recurring monitor.
> Neither probe nor recurring monitor changes require a restart, so
> nothing is done.
>
> It would probably make sense to distinguish between probes that found
> the resource running and probes that found it not running. Parameter
> changes in the former should probably be treated like start.
>

Which leaves the non trivial task to the RA to determine during a probe
if a resource is not just running or stopped but has been started with
exactly those parameters - right? May be easy for some RAs and
a real issue for others. Not talking of which RA has it implemented
like that already. Error code would be generic-error to trigger a stop-start
- right?
If I'm getting it right even without that in depth checking on probe it
would have worked in this case as the probe happens before parameter
change.

Klaus

>
> On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote:
> > Hi,
> >
> > i try to find out why there is sometimes a restart of the resource
> > and sometimes not.
> > Unpredictable behaviour is someting i expect from Windows, not from
> > Linux.
> > Here you see two "crm resource trace "resource"".
> > In the first case the resource is restarted , in the second not.
> > The command i used is identical in both cases.
> >
> > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap
> > Fri Oct 14 19:05:51 CEST 2022
> > INFO: Trace for vm-genetrap is written to
> > /var/lib/heartbeat/trace_ra/
> > INFO: Trace set, restart vm-genetrap to trace non-monitor operations
> >
> > =
> > =
> >
> > 1st try:
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: --- 7.28974.3 2
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  +  /cib:  @epoch=28975, @num_updates=0
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-monitor-
> > 30']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-monito
> > r-30-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > ributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-stop-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins
> > tance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > tes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-start-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >> name="trace_ra" value="1" id="vm-genetrap-start-0-i
> > nstance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >  > utes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_from-
> > 0']:  
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> >   > name="trace_ra" value="1" id="vm-genetrap-mi
> > grate_from-0-instance_attributes-trace_ra"/>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++
> > > _attributes>
> > Oct 14 19:05:52 [25996] ha-idg-1cib: info:
> > cib_perform_op:  ++ /cib/configuration/resources/primitive[@id='vm-
> > genetrap']/operations/op[@id='vm-genetrap-migrate_to-
> > 0']:  
> > Oct 14 19:05:52