Hi,

I've run into what looks at first blush to be a CIB bug in writing to disk.

The key messages from this incident are these:


Mar 31 19:02:52 vhost0384 cib: [13294]: ERROR: validate_cib_digest: Digest comparision failed: expected 316049fa7ee8d2e107573ce7cded07cf (/var/lib/heartbeat/crm/cib.GUdD9T), calculated 0bac3440f5c42f0f37d22ea7dfe433e8 Mar 31 19:02:52 vhost0384 cib: [13294]: ERROR: retrieveCib: Checksum of /var/lib/heartbeat/crm/cib.uHFtAW failed! Configuration contents ignored! Mar 31 19:02:52 vhost0384 cib: [13294]: ERROR: retrieveCib: Usually this is caused by manual changes, please refer to http://clusterlabs.org/wiki/FAQ#cib_changes_detected Mar 31 19:02:52 vhost0384 cib: [13294]: WARN: retrieveCib: Continuing but /var/lib/heartbeat/crm/cib.uHFtAW will NOT used.


I did not make manual changes on a running CIB. I was using the cluster shell at the time. The CIB it is complaining about appears to be an intact, valid CIB with contents approximately like they should have been at the time. By the way, I have a report from another IBMer that they have seen systems that stop writing to their local CIBs. I'll contact him.

Here are some relevant facts:
  These machines are virtual guests in a cloud somewhere - operations
        have somewhat unpredictable latency.  But, nothing too egregious
        was happening at the time or Heartbeat would have bitched.
  I was doing some testing at the time.  I was putting on and
        taking off constraints using the cluster shell
        migrate and unmigrate operations.

Given that the file looks intact, and I know how the CIB is written to disk (since I originally wrote that code), I wonder if it isn't a versioning issue / race condition. That is, the code for writing to disk does NOT guarantee when it gets done (assuming you're still using it). It would be easy to do a checksum on the wrong version compared to the version you thought it should be (or before it completed).

Andrew: You should have already received all the relevant logs to you on a separate email.

Also, for my reference - what method are you using to compute the digest of the file? That is, what command should I execute to get the same results?

--
    Alan Robertson <al...@unix.sh>

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to