On Mo, 11.12.17 00:36, Uoti Urpala (uoti.urp...@pp1.inet.fi) wrote: > The code that tries to determine which of two journal entries is > "earlier" (similar logic in journal_file_comparare_locations() and > compare_with_location()) does this by trying different progressively > less reliable ways to compare - first by seqnum if seqnum id matches, > then by monotonic time if boot id matches, then by realtime timestamp. > Unfortunately, while this method could be said to give the "right" > answer more often than just comparing by the always-available realtime > timestamp, it does not give a valid ordering relation. As in, unless > you can already rely on all those comparisons giving the same > consistent order (when you could probably just use realtime anyway), > it's possible that you get a relation like A<B<C<A. > > For example, these would have such a relation (assume that the omitted > monotonic values never determine order): > A: seqnum_id=0, seqnum=0, time=2 > B: seqnum_id=0, seqnum=1, time=0 > C: seqnum_id=1, seqnum=0, time=1 > > A<B by seqnum, B<C by time, C<A by time. > > Such ordering means that the code interleaving entries from different > files can give rather arbitrary results depending on exactly which > order the entries are tested in. It triggers at least one issue that I > consider a clear bug: there's code in next_beyond_location() which > skips the next entry in a file if it's not in the expected direction > from the previous globally iterated entry, and this can discard valid > entries. A comment there says it's meant to discard duplicate entries > which were somehow recorded in multiple journal files (which I'd assume > to compare equal), but it also discards non-duplicate entries which > compare backwards from the previously shown one.
Note that two entries will only compare as fully identical if their "xor_hash" is equal too. The xor_hash is the XOR combination of the hashes of all of the entry's fields. That means realistically only records that actually are identical should be considered as such. > The reason I originally looked into this code was a set of old journal > files for which plain "journalctl" only showed 5 days of entries > despite the files containing months of data. This was triggered by a > user journal file which matched seqnum ids with some system journal > files, but had smaller seqnums from a much later time (the earliest > logs were old enough that the file could have been initialized by > journald before commit cbd671772c, so the exact same way to trigger > issues probably won't occur with files created by later versions). The > code searching for earlier entries first found the early system > journal, then moved to the user journal because it had smaller seqnums, > and finally moved to some other file that started 5 days before the end > (earlier than the user journal and with a different seqnum id - there > didn't happen to be other earlier files later in the iteration order). > After printing one entry from there, the next_beyond_location() > "duplicate" code then discarded all the earlier valid entries. > > I'm not sure about the best way to fix these issues - are there > supposed to be any guarantees about interleaving? At the very least the > "duplicate" check should be fixed to not throw away arbitrary amounts > of valid entries. Any other code relying on assumptions about valid > ordering? Is the interleaving order supposed to be stable independent > of things like in which order the files are iterated over? So, the current "duplicate" check is not just that actually, it's also a check that guarantees that we progress monotonically, and never go backwards in time, even when journal files are added or removed while we iterate through the files. I am not entirely sure what we can do here. Maybe this can work: beef up the comparison logic so that it returns more than smaller/equal/larger but also a special value "ambiguous". And when that is returned we don't enforce monotonicity strictly but instead go record-by-record, if you follow what I mean? Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel