On Wed, 2017-12-13 at 18:40 +0100, Lennart Poettering wrote: > On Mi, 13.12.17 00:43, Uoti Urpala (uoti.urp...@pp1.inet.fi) wrote: > > correct position, so I don't see why the monotonicity discard would be > > needed for that case either? > > I figure you are right. > > Any chance you can prep a patch for this change?
A patch with a fairly long commit message explaining the rationale and related issues attached. BTW current sd_journal_next() documentation claims "The journal is strictly ordered by reception time, and hence advancing to the next entry guarantees that the entry then pointing to is later in time than then previous one, or has the same timestamp." Isn't this OBVIOUSLY false when talking about realtime timestamps if you don't include caveats about changing the machine time?
From 7070d7a3158cb0e0d14ba6167f7e0548a0022f55 Mon Sep 17 00:00:00 2001 From: Uoti Urpala <uau@glyph.nonexistent.invalid> Date: Thu, 14 Dec 2017 01:41:36 +0200 Subject: [PATCH] journal: fix lost entries when timestamps are inconsistent When journalctl or the sd_journal_* API interleaved entries from multiple journal files into one sequence, inconsistent ordering information in the files could lead to entries being silently discarded. Fix this so that all entries are shown at least in some order, even if no guarantees are made about the order. The code determining the interleaved order of entries is based on pairwise comparisons, where a pair of entries is compared using the first applicable of three progressively less reliable ways to compare: first by sequence number if seqnum id matches, then by monotonic timestamp if boot id matches, then by realtime timestamp (entries within one journal file are always implicitly ordered by their order in the file, which should match seqnum order unless the file is corrupt). Unfortunately, while this method could be said to give the "right" answer more often than just comparing by the always-available realtime timestamp, it does not give a globally consistent order - if the ways to compare do not agree, it's possible to have three entries A, B and C such that A<B<C<A. For example as follows: A: seqnum_id=0, seqnum=1, time=0 B: seqnum_id=1, seqnum=0, time=1 C: seqnum_id=0, seqnum=0, time=2 A<B by time, B<C by time, C<A by seqnum. The interleaving code goes through all files and checks whether the next entry from that file is earlier than the current candidate entry, and so finds the earliest available one. Given entries A, B, C like above, it could pick any of them depending on the order in which the files happen to be iterated over. In this situation, entries could be lost because of a check in next_beyond_location() that discarded entries which compare <= the previously chosen entry. But whichever of the three entries we choose, there's at least one that still compares smaller and so would be silently discarded. The rationale for this check given in comments is to discard duplicate entries that somehow ended up in multiple journal files. But checking for full equality should be enough for this purpose. Early code seems to have checked for equality, then it was changed in the restructuring in commit de190aef08bb267 ("journal: implement multiple field matches") for some reason. Change the code to only discard an entry if it's verified to be fully equal to one in another file. There doesn't seem to be any obvious reason why the monotonicity check would be needed (not anything remembered by its author either). This should guarantee that entries are at least not silently discarded. However, some ordering-related issues remain that should probably be at least documented if not fixed: - When ordering information is inconsistent, the order of entries is arbitrary, and can change depending on factors should as directory list order used by the filesystem containing the journal files, what match terms if any were given, and iterating direction. - As a consequence, sd_journal_previous() is not the inverse of sd_journal_next(), and can take you to different entries than where you came from using the other one. If the inconsistency is limited, for example realtime clock goes back a couple of minutes at boot, then this should normally only affect ordering within the overlapping period. However, if you have at least one journal file where the clock is seriously wrong while the other information indicates another order (or other causes for a major discrepancy, such as outright corrupt files) then very little is guaranteed about the resulting order. Guaranteeing a fully stable order (that wouldn't change depending on match terms etc) would likely be hard without a global view of all entries, but for example guaranteeing that all entries that share a seqnum id are ordered by their seqnum with respect to each other should be doable. --- src/journal/sd-journal.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/src/journal/sd-journal.c b/src/journal/sd-journal.c index 8a1b161d8..5dedd9ea2 100644 --- a/src/journal/sd-journal.c +++ b/src/journal/sd-journal.c @@ -798,13 +798,10 @@ static int next_beyond_location(sd_journal *j, JournalFile *f, direction_t direc for (;;) { bool found; - if (j->current_location.type == LOCATION_DISCRETE) { - int k; - - k = compare_with_location(f, &j->current_location); - - found = direction == DIRECTION_DOWN ? k > 0 : k < 0; - } else + if (j->current_location.type == LOCATION_DISCRETE) + // only == 0 or not matters + found = compare_with_location(f, &j->current_location); + else found = true; if (found) -- 2.15.1
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel