On Mon, 2019-05-20 at 15:29 +0200, Ulrich Windl wrote: > What worries me is "Rejecting name for unique".
Trace messages are often not user-friendly. The rejecting/accepting is nothing to be concerned about; it just refers to which parameters are being used to calculate that particular hash. Pacemaker calculates up to three hashes. The first is a hash of all the resource parameters, to detect if anything changed; this is stored as "op-digest" in the CIB status entries. If the resource is reloadable, another hash is calculated with just the parameters marked as unique=1 (which means they can't be reloaded). Any change in these parameters requires a full restart. This one is "op- restart-digest". Finally, if the resource has sensitive parameters like passwords, a hash of everything but those parameters is stored as "op-secure- digest". This one is only used when simulating CIBs grabbed from cluster reports, which have sensitive info scrubbed. From what's described here, the op-restart-digest is changing every time, which means something is going wrong in the hash comparison (since the definition is not really changing). The log that stands out to me is: trace May 18 23:02:49 calculate_xml_digest_v1(83):0: digest:source <parameters id="0"/> The id is the resource name, which isn't "0". That leads me to: trace May 18 23:02:49 svc_read_output(87):0: Got 499 chars: <parameter name="id" unique="1" required="1"> which is the likely source of the problem. "id" is a pacemaker property, not an OCF resource parameter. It shouldn't be in the resource agent meta-data. Remove that, and I bet it will be OK. BTW the "every 15 minutes" would be the cluster-recheck-interval cluster property. > > > > Kadlecsik József <kadlecsik.joz...@wigner.mta.hu> schrieb am > > > > 20.05.2019 > > um > 14:37 in Nachricht < > alpine.deb.2.20.1905201428050.24...@blackhole.kfki.hu>: > > On Sun, 19 May 2019, Kadlecsik József wrote: > > > > > On Sat, 18 May 2019, Kadlecsik József wrote: > > > > > > > On Sat, 18 May 2019, Kadlecsik József wrote: > > > > > > > > > On Sat, 18 May 2019, Andrei Borzenkov wrote: > > > > > > > > > > > 18.05.2019 18:34, Kadlecsik József пишет: > > > > > > > We have a resource agent which creates IP tunnels. In > > > > > > > spite of the > > > > > > > configuration setting > > > > > > > > > > > > > > primitive tunnel-eduroam ocf:local:tunnel \ > > > > > > > params .... > > > > > > > op start timeout=120s interval=0 \ > > > > > > > op stop timeout=300s interval=0 \ > > > > > > > op monitor timeout=30s interval=30s depth=0 \ > > > > > > > meta target-role=Started > > > > > > > order bifur-eduroam-ipv4-before-tunnel-eduroam \ > > > > > > > Mandatory: bifur-eduroam-ipv4 tunnel-eduroam > > > > > > > colocation tunnel-eduroam-on-bifur-eduroam-ipv4 inf: > > > > > > > tunnel-eduroam > > \ > > > > > > > bifur-eduroam-ipv4:Started > > > > > > > > > > > > > > the resource is restarted again and again. According to > > > > > > > the debug > > logs: > > > > > > > > > > > > > > Parameters to tunnel-eduroam_start_0 on bifur1 > > > > > > > changed: was > > > > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now > > > > f2317cad3d54cec5d7d7aa7d0bf35cf8 > > > > > > > (restart:3.0.11) 0:0;48:3:0:73562fd6-1fe2-4930-8c6e- > > > > > > > 5953b82ebb32 > > > > > > > > > > > > This means that instance attributes changed in this case > > > > > > pacemaker > > > > > > restarts resource to apply new values. Turning on trace > > > > > > level > > hopefully > > > > > > will show what exactly is being changed. You can also dump > > > > > > CIB > > before > > > > > > and after restart to compare current information. > > > > > > > > > > The strange thing is that the new value seems never be > > > > > stored. Just the > > > > > "was-now" part from the log lines: > > > > > > > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now > > > > f2317cad3d54cec5d7d7aa7d0bf35cf8 > > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now > > > > f2317cad3d54cec5d7d7aa7d0bf35cf8 > > > > > was 94afff0ff7cfc62f7cb1d5bf5b4d83aa vs. now > > > > f2317cad3d54cec5d7d7aa7d0bf35cf8 > > > > > ... > > > > > > > > > > However, after issuing "cibadmin --query --local", the whole > > > > > flipping > > > > > stopped! :-) Thanks! > > > > > > > > No, I was wrong - it still repeats every ~15mins. The diff > > > > between two > > cib > > > > xml dumps doesn't say much to me - I'm going to enable tracing. > > > > > > I have attached the trace file created according to > > > http://blog.clusterlabs.org/blog/2013/pacemaker-logging. > > > > > > What looks strange to me is that build_parameter_list() first > > > rejects > > > attributes, then accepts them: > > > > > > trace May 18 23:02:49 build_operation_update(787):0: Including > > > additional > > digests for ocf::local:tunnel > > > trace May 18 23:02:49 build_parameter_list(621):0: Rejecting > > > name for > > > > unique > > > trace May 18 23:02:49 build_parameter_list(614):0: Attr id is > > > unique > > > trace May 18 23:02:49 build_parameter_list(632):0: Adding attr > > > id=0 to > > the > > xml result > > > trace May 18 23:02:49 build_parameter_list(621):0: Rejecting > > > src_ip for > > > > unique > > > trace May 18 23:02:49 build_parameter_list(621):0: Rejecting > > > dst_ip for > > > > unique > > > ... > > > trace May 18 23:02:49 calculate_xml_digest_v1(71):0: Sorting > > > xml... > > > trace May 18 23:02:49 calculate_xml_digest_v1(73):0: Done > > > trace May 18 23:02:49 crm_md5sum(2102):0: Beginning digest of > > > 22 bytes > > > trace May 18 23:02:49 crm_md5sum(2110):0: Digest > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa. > > > > > > and then: > > > > > > trace May 18 23:02:49 calculate_xml_digest_v1(83):0: > > > digest:source > > > > <parameters id="0"/> > > > trace May 18 23:02:49 append_restart_list(693):0: tunnel- > > > eduroam: > > > > 94afff0ff7cfc62f7cb1d5bf5b4d83aa, id > > > trace May 18 23:02:49 append_restart_list(694):0: restart > > > digest source > > <parameters id="0"/> > > > trace May 18 23:02:49 build_parameter_list(621):0: Rejecting > > > name for > > > > private > > > trace May 18 23:02:49 build_parameter_list(625):0: Inverting > > > name match > > > > for private xml > > > trace May 18 23:02:49 build_parameter_list(632):0: Adding attr > > > > name=eduroam IPv4 tunnel to the xml result > > > trace May 18 23:02:49 build_parameter_list(621):0: Rejecting id > > > for > > > > private > > > trace May 18 23:02:49 build_parameter_list(625):0: Inverting id > > > match for > > private xml > > > trace May 18 23:02:49 build_parameter_list(632):0: Adding attr > > > id=0 to > > the > > xml result > > > .... > > > > > > By the way, it's debian stretch with pacemaker 1.1.16-1. > > > > I have double and triple checked the agent and it seems just a > > normal, > > working agent. > > > > The agent accepts the reload operation, it is advertised in the > > actions > > section of its metadata, there are parameters with unique set to 0 > > and > > still stop/start is called instead of reload. (I could even live > > with > > reload instead of start/stop in every 15 mins). > > > > As a desperate attempt, I deleted the resource and re-added and it > > of > > course did not help. > > > > I also created the attached trace file during creating the resource > > in the > > hope that it could help find the reason of the permanent > > stop/start. > > > > Best regards, > > Jozsef > > -- > > E-mail : kadlecsik.joz...@wigner.mta.hu > > PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt > > Address: Wigner Research Centre for Physics, Hungarian Academy of > > Sciences > > H-1525 Budapest 114, POB. 49, Hungary > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/