On 11/02/19 15:03 -0600, Ken Gaillot wrote: > On Fri, 2019-02-01 at 08:10 +0100, Jan Pokorný wrote: >> On 28/01/19 09:47 -0600, Ken Gaillot wrote: >>> On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote: >>> Pacemaker can handle the clock jumping forward, but not backward. >> >> I am rather surprised, are we not using monotonic time only, then? >> If so, why? > > The scheduler runs on a single node (the DC) but must take as input the > resource history (including timestamps) on all nodes. We need wall > clock time to compare against time-based rules.
Yep, was aware of the troubles with this. > Also, if we get two resource history entries from a node, we don't > know if it rebooted in between, so a monotonic timestamp alone > wouldn't be sufficient. Ah, that's along the lines of Ulrich's response, I see it now, thanks. I am not sure if there could be a step around that using boot IDs when provided by the platform (like the hashes in case of systemd, assuming just an equality test is all that's needed, since both histories cannot arrive at the same time and presumably the FIFO gets preserved -- it shall under normal circumstances, incl. no time travels and sane, non-byzantine process, which might even be partially detected and acted upon [two differing histories without any recollection the node was fenced by the cluster? fence it for sure and good measure!]). > However, it might be possible to store both time representations in the > history (and possibly maintain some sort of cluster knowledge about > monotonic clocks to compare them within and across nodes), and use one > or the other depending on the context. I haven't tried to determine how > feasible that would be, but it would be a major project. Just thinking aloud, the DC node, once won the election, will cause other nodes (incl. new-comers during it's ruling) to (re)set the offsets DC's vs. their own internal wall-clock time clocks (for time-based rules should they ever become DC on their own), all nodes will (re)establish monotonic to wall-clock time conversion based on these one-off inputs, and from that point on, they can operate fully detached from the wall-clock time (so that changing the wall clock will have no immediate effect on the cluster unless (re)harmonizing desired via some configured event handler that could reschedule the plan appropriately, or delay the sync for a more suitable moment). This indeed counts on pacemaker used in a full-fledged cluster mode, i.e., requiring quorum (not to speak about fencing). As a corollary, with such a scheme, time-based rules would only be allowed when quorum fully honoured (afterall, it makes more sense to employ cron or timer systemd units otherwise, since they all use a local time only, which is the right fit, then, as opposed to a distributed system). >> We shall not need any explicit time synchronization across the nodes >> since we are already backed by extended virtual synchrony from >> corosync, eventhough it could introduce strangenesses when >> time-based rules kick in. > > Pacemaker determines the state of a resource by replaying its resource > history in the CIB. A history entry can be replaced only by a newer > event. Thus if there's a start event in the history, and a stop result > comes in, we have to know which one is newer to determine whether the > resource is started or stopped. But for that, boot ID might be a sufficient determinant, since a result (keyed with boot ID ABC) for an action never triggered with boot ID XYZ deemed "current" ATM means that something is off, and fencing is perhaps the best choice. > Something along those lines is likely the cause of: > > https://bugs.clusterlabs.org/show_bug.cgi?id=5246 I believe the "detached" scheme sketched above could solve this. Problem is, devil is in the details to it can be unpredictably hard to get it right in the distributed environment. -- Jan (Poki)
pgpzb2j8BIGQK.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org