| Smalyshev added a comment. |
The dateModified triple is a global check, so yes if we do have multiple updaters running in parallel, we might not catch the failure of only one of them with a single check.
There's a deeper question behind this - one that is also relevant for Kafka poller alone. If we have multiple streams of input (Kafka poller does) how do we define what is "current timestamp"? We have the following options:
- Minimum of the stream positions - the downside is that if one of the streams rarely has any events, say once per day, the timestamp is stuck behind
- Maximum of the stream positions - the danger is if one of the streams lag (and they are always lagging a bit behind one another, since the polling is not 100% parallel, but batched) then the messages in the delta might be lost
- Some other way? This is why I decided to persist Kafka offsets - this solves the problem of uneven timestamps in different streams.
One timestamp is OK when we're talking about one source (like dump), but for multiple sources we're likely to have to use multiple timestamps - or, in Kafka case, offsets.
TASK DETAIL
EMAIL PREFERENCES
To: Smalyshev
Cc: Gehel, Addshore, Daniel_Mietchen, Smalyshev, Aklapper, RazShuty, LJ, Lahi, Gq86, Darkminds3113, Andrawaag, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Lydia_Pintscher, Mbch331
Cc: Gehel, Addshore, Daniel_Mietchen, Smalyshev, Aklapper, RazShuty, LJ, Lahi, Gq86, Darkminds3113, Andrawaag, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
