Smalyshev added a comment.

Looks like something weird is going on with recentchanges stream. Looking at the relevant fragment:

{
type: "edit",
ns: 0,
title: "Q29002776",
pageid: 30662598,
revid: 469529345,
old_revid: 469529329,
rcid: 498001207,
timestamp: "2017-03-22T19:14:14Z"
},
{
type: "edit",
ns: 0,
title: "Q26272025",
pageid: 28223090,
revid: 469529353,
old_revid: 468296753,
rcid: 498001280,
timestamp: "2017-03-22T19:14:14Z"
},
{
type: "edit",
ns: 0,
title: "Q29002776",
pageid: 30662598,
revid: 469529346,
old_revid: 469529345,
rcid: 498001208,
timestamp: "2017-03-22T19:14:15Z"
},

We can see that the entry for Q26272025 has rcid 498001280, which means it was inserted much later than Q29002776 entries, even though the timestamp is the same. Looking at this:

{
type: "edit",
ns: 0,
title: "Q29002778",
pageid: 30662600,
revid: 469529418,
old_revid: 469529412,
rcid: 498001279,
timestamp: "2017-03-22T19:14:27Z"
}

it appears it has been inserted full 13 seconds later than its timestamp suggests. WDQS updater currently uses backoff time of 10 seconds, which explains how it missed the update - by the time it was processing rcstream for that time period, the record for Q26272025 did not exist yet, and by the time it was backpatched in, the updater was too far ahead to check back at this time window.

The problem here is that the only real way of reading recentchanges stream for now is by timestamp. And turns out entries can be backpatched into it basically without time limit, and making too wide window makes it impossible to work for the updater, since it can only load maximum of 500 entries apiece.

So we either need to find a way to reliably get updates without missing stuff or switch to some API that is not based on recentchanges stream.


TASK DETAIL
https://phabricator.wikimedia.org/T161342

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, Aklapper, Jheald, QZanden, EBjune, merbst, Avner, debt, Gehel, D3r1ck01, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to