Re: [ClusterLabs] crm resource trace
On Tue, 2022-10-18 at 20:48 +0200, Lentes, Bernd wrote: > - On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com > wrote: > > > This turned out to be interesting. > > > > In the first case, the resource history contains a start action and > > a > > recurring monitor. The parameters to both change, so the resource > > requires a restart. > > > > In the second case, the resource's history was apparently cleaned > > at > > some point, so the cluster re-probed it and found it running. That > > means its history contained only the probe and the recurring > > monitor. > > Neither probe nor recurring monitor changes require a restart, so > > nothing is done. > > "vm-genetrap_monitor_0". Is that a probe ? > > Bernd Yes -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
- On 17 Oct, 2022, at 21:41, Ken Gaillot kgail...@redhat.com wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's history was apparently cleaned at > some point, so the cluster re-probed it and found it running. That > means its history contained only the probe and the recurring monitor. > Neither probe nor recurring monitor changes require a restart, so > nothing is done. "vm-genetrap_monitor_0". Is that a probe ? Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] trace of resource ‑ sometimes restart, sometimes not
- On 18 Oct, 2022, at 14:35, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: > > # crm configure > edit ... > verify > ptest nograph #!!! > commit That's very helpful. I didn't know that, Thanks. > -- > If you used that, you would have seen the restart. > Despite of that I wonder why enabling tracing to start/stop must induce a > resource restart. > > Bernd, are you sure that was the only thing changed? Do you have a record of > commands issued? I'm pretty sure it was the only thing. Bernd smime.p7s Description: S/MIME Cryptographic Signature ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Re: trace of resource ‑ sometimes restart, sometimes not
>>> Ken Gaillot schrieb am 07.10.2022 um 01:08 in Nachricht : [...] > > trace_ra is unusual in that it's supported automatically by the OCF > shell functions, rather than by the agents directly. That means it's > not advertised in metadata. Otherwise agents could mark it as > reloadable, and reload would be a quick no‑op. That reads like "Dirty hack, dirty consequences" ;-) [...] Regards, Ulrich ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] trace of resource ‑ sometimes restart, sometimes not
>>> "Lentes, Bernd" schrieb am 06.10.2022 um 21:05 in Nachricht <280206366.19581344.1665083124300.javamail.zim...@helmholtz-muenchen.de>: > Hi, > > i have some problems with our DLM, so i wanted to trace it. Yesterday i just > set a trace for "monitor". No restart of DLM afterwards. It went fine as > expected. > I got logs in /var/lib/heartbeat/trace_ra. After some monitor i stopped > tracing. > > Today i set a trace for all operations. > Now resource DLM restarted: > * Restartdlm:0 ( ha-idg-1 ) due to > resource definition change > I didn't expect that so i had some trouble. # crm configure edit ... verify ptest nograph #!!! commit -- If you used that, you would have seen the restart. Despite of that I wonder why enabling tracing to start/stop must induce a resource restart. Bernd, are you sure that was the only thing changed? Do you have a record of commands issued? > Is the difference in this behaviour intentional ? If yes, why ? Is there a > rule ? > > Furthermore i'd like to ask where i can find more information about DLM, > because it is a mystery for me. > Sometimes the DLM does not respond to the "monitor", so it needs to be > restarted, and therefore all depending resources (which is a lot). > This happens under some load (although not completely overwhelmed). Regards, Ulrich > > Thanks. > > Bernd > > -- > Bernd Lentes > System Administrator > Institute for Metabolism and Cell Death (MCD) > Building 25 - office 122 > HelmholtzZentrum München > bernd.len...@helmholtz-muenchen.de > phone: +49 89 3187 1241 >+49 89 3187 49123 > fax: +49 89 3187 2294 > https://www.helmholtz-munich.de/en/mcd > > Public key: > 30 82 01 0a 02 82 01 01 00 b3 72 3e ce 2c 0a 6f 58 49 2c 92 23 c7 b9 c1 ff > 6c 3a 53 be f7 9e e9 24 b7 49 fa 3c e8 de 28 85 2c d3 ed f7 70 03 3f 4d 82 fc > cc 96 4f 18 27 1f df 25 b3 13 00 db 4b 1d ec 7f 1b cf f9 cd e8 5b 1f 11 b3 a7 > 48 f8 c8 37 ed 41 ff 18 9f d7 83 51 a9 bd 86 c2 32 b3 d6 2d 77 ff 32 83 92 67 > 9e ae ae 9c 99 ce 42 27 6f bf d8 c2 a1 54 fd 2b 6b 12 65 0e 8a 79 56 be 53 89 > 70 51 02 6a eb 76 b8 92 25 2d 88 aa 57 08 42 ef 57 fb fe 00 71 8e 90 ef b2 e3 > 22 f3 34 4f 7b f1 c4 b1 7c 2f 1d 6f bd c8 a6 a1 1f 25 f3 e4 4b 6a 23 d3 d2 fa > 27 ae 97 80 a3 f0 5a c4 50 4a 45 e3 45 4d 82 9f 8b 87 90 d0 f9 92 2d a7 d2 67 > 53 e6 ae 1e 72 3e e9 e0 c9 d3 1c 23 e0 75 78 4a 45 60 94 f8 e3 03 0b 09 85 08 > d0 6c f3 ff ce fa 50 25 d9 da 81 7b 2a dc 9e 28 8b 83 04 b4 0a 9f 37 b8 ac 58 > f1 38 43 0e 72 af 02 03 01 00 01 ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] crm resource trace
On Mon, Oct 17, 2022 at 9:42 PM Ken Gaillot wrote: > This turned out to be interesting. > > In the first case, the resource history contains a start action and a > recurring monitor. The parameters to both change, so the resource > requires a restart. > > In the second case, the resource's history was apparently cleaned at > some point, so the cluster re-probed it and found it running. That > means its history contained only the probe and the recurring monitor. > Neither probe nor recurring monitor changes require a restart, so > nothing is done. > > It would probably make sense to distinguish between probes that found > the resource running and probes that found it not running. Parameter > changes in the former should probably be treated like start. > Which leaves the non trivial task to the RA to determine during a probe if a resource is not just running or stopped but has been started with exactly those parameters - right? May be easy for some RAs and a real issue for others. Not talking of which RA has it implemented like that already. Error code would be generic-error to trigger a stop-start - right? If I'm getting it right even without that in depth checking on probe it would have worked in this case as the probe happens before parameter change. Klaus > > On Mon, 2022-10-17 at 12:43 +0200, Lentes, Bernd wrote: > > Hi, > > > > i try to find out why there is sometimes a restart of the resource > > and sometimes not. > > Unpredictable behaviour is someting i expect from Windows, not from > > Linux. > > Here you see two "crm resource trace "resource"". > > In the first case the resource is restarted , in the second not. > > The command i used is identical in both cases. > > > > ha-idg-2:~/trace-untrace # date; crm resource trace vm-genetrap > > Fri Oct 14 19:05:51 CEST 2022 > > INFO: Trace for vm-genetrap is written to > > /var/lib/heartbeat/trace_ra/ > > INFO: Trace set, restart vm-genetrap to trace non-monitor operations > > > > = > > = > > > > 1st try: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: Diff: --- 7.28974.3 2 > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: Diff: +++ 7.28975.0 299af44e1c8a3867f9e7a4b25f2c3d6a > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: + /cib: @epoch=28975, @num_updates=0 > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-monitor- > > 30']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-monito > > r-30-instance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > ributes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-stop- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-stop-0-ins > > tance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > tes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-start- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > >> name="trace_ra" value="1" id="vm-genetrap-start-0-i > > nstance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > utes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-migrate_from- > > 0']: > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > name="trace_ra" value="1" id="vm-genetrap-mi > > grate_from-0-instance_attributes-trace_ra"/> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ > > > _attributes> > > Oct 14 19:05:52 [25996] ha-idg-1cib: info: > > cib_perform_op: ++ /cib/configuration/resources/primitive[@id='vm- > > genetrap']/operations/op[@id='vm-genetrap-migrate_to- > > 0']: > > Oct 14 19:05:52