[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
Gehel closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, Gehel Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, Maintenance_bot Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a comment. Change 642077 **merged** by jenkins-bot: [wikidata/query/rdf@master] Unify re-ordering and decide mutation operation https://gerrit.wikimedia.org/r/642077 TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
dcausse added a comment. There were no new inconsistent events found in the past two days. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
dcausse moved this task from In Progress to Needs review on the Discovery-Search (Current work) board. dcausse added a comment. The new approach seems to work. - Backfill period: `2020-11-06T23:00:01` -> `2020-11-20T13:40:00` - Dump reconciliation: `2020-11-06T23:00:01` -> `2020-11-12T03:12:51` The spurious events are almost all related to the dump reconciliation period (oldest to newest revision in the dumps): ++---+---+---+-+---+--+ |y |m |d |inconsistency |status |event_type |count | ++---+---+---+-+---+--+ |2020|11 |6 |newer_revision_seen|CREATED |revision-create|14998 | |2020|11 |7 |newer_revision_seen|CREATED |revision-create|380191| |2020|11 |8 |newer_revision_seen|CREATED |revision-create|435488| |2020|11 |9 |newer_revision_seen|CREATED |revision-create|310343| |2020|11 |10 |newer_revision_seen|CREATED |revision-create|180821| |2020|11 |10 |newer_revision_seen|UNDEFINED|page-delete|2 | |2020|11 |11 |newer_revision_seen|CREATED |revision-create|136742| |2020|11 |11 |newer_revision_seen|UNDEFINED|page-delete|18| |2020|11 |12 |newer_revision_seen|CREATED |revision-create|7234 | |2020|11 |17 |newer_revision_seen|CREATED |revision-create|1 | ++---+---+---+-+---+--+ Note that at the time of exporting this data the pipeline had fully backfilled and was reading current events (2020-11-20 events). The kind of inconsistencies we see during the reconciliation period: - page-delete -> newer_revision_seen|UNDEFINED, means the item was deleted during that period but before being exported to the dump (rare). - revision-create -> newer_revision_seen|CREATED, means the revision create read was already exported in the dump (frequent) The inconsistency on `2020-11-17` is real but in line with our expectections to have a couple inconsistencies per day. Details are: +--++--+---+---+--+ |item |event_time |revision |parent_revision|status |rev | +--++--+---+---+--+ |Q102046169|2020-11-17T15:09:47Z|1308001440|1308001360 |CREATED|1308001440| +--++--+---+---+--+ Which seems to indicate a duplicate event sent by changeprop (revision create for 1308001440 while 1308001440 is in the state). I'll let the pipeline run for the week-end. TASK DETAIL https://phabricator.wikimedia.org/T267029 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a comment. Change 642077 had a related patch set uploaded (by DCausse; owner: DCausse): [wikidata/query/rdf@master] Unify re-ordering and decide mutation operation https://gerrit.wikimedia.org/r/642077 TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
dcausse added a comment. After a test run it seems that we are able to backfill, unfortunately we skip a non negligible number of revision: ++---+---+---+---+---+---+-+ |y |m |d |h |inconsistency |status |event_type |count| ++---+---+---+---+---+---+-+ |2020|11 |18 |15 |newer_revision_seen|CREATED|revision-create|190 | |2020|11 |18 |16 |newer_revision_seen|CREATED|revision-create|85 | |2020|11 |18 |17 |newer_revision_seen|CREATED|revision-create|406 | |2020|11 |18 |18 |newer_revision_seen|CREATED|revision-create|584 | |2020|11 |18 |19 |newer_revision_seen|CREATED|revision-create|333 | |2020|11 |18 |20 |newer_revision_seen|CREATED|revision-create|361 | |2020|11 |18 |21 |newer_revision_seen|CREATED|revision-create|86 | |2020|11 |18 |22 |newer_revision_seen|CREATED|revision-create|278 | |2020|11 |18 |23 |newer_revision_seen|CREATED|revision-create|63 | |2020|11 |19 |0 |newer_revision_seen|CREATED|revision-create|110 | |2020|11 |19 |1 |newer_revision_seen|CREATED|revision-create|42 | |2020|11 |19 |2 |newer_revision_seen|CREATED|revision-create|48 | |2020|11 |19 |3 |newer_revision_seen|CREATED|revision-create|18 | |2020|11 |19 |4 |newer_revision_seen|CREATED|revision-create|13 | |2020|11 |19 |5 |newer_revision_seen|CREATED|revision-create|94 | |2020|11 |19 |6 |newer_revision_seen|CREATED|revision-create|27 | |2020|11 |19 |7 |newer_revision_seen|CREATED|revision-create|148 | |2020|11 |19 |8 |newer_revision_seen|CREATED|revision-create|34 | ++---+---+---+---+---+---+-+ There are revision create events we receive but for which we received a newer revision (few cases I manually checked where unordered by 1 to 30 secs). For the wdqs use-case I think it's not a big deal to skip few revisions (diffing with rev < N-1) this might be confusing for other use cases we don't yet have. I'm a bit undecided here, one solution could be to delay revisions where we know that there's likely one in-between (current != rev_parent_id) but this moves the buffering logic to the operator which has the entity->rev state. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
Maintenance_bot removed a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, Maintenance_bot Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Alter-paule, Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a comment. Change 640444 **merged** by jenkins-bot: [wikidata/query/rdf@master] Add support for partial re-ordering https://gerrit.wikimedia.org/r/640444 TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a project: Patch-For-Review. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
gerritbot added a comment. Change 640444 had a related patch set uploaded (by DCausse; owner: DCausse): [wikidata/query/rdf@master] Add support for partial re-ordering https://gerrit.wikimedia.org/r/640444 TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse, gerritbot Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully
dcausse renamed this task from "Tune the streaming-updater-producer to limit late events" to "The streaming-updater-producer should handle backfills gracefully". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T267029 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs