[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-12-14 Thread Gehel
Gehel closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, Gehel
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-27 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, Maintenance_bot
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Alter-paule, 
Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, 
Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-27 Thread gerritbot
gerritbot added a comment.


  Change 642077 **merged** by jenkins-bot:
  [wikidata/query/rdf@master] Unify re-ordering and decide mutation operation
  
  https://gerrit.wikimedia.org/r/642077

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-23 Thread dcausse
dcausse added a comment.


  There were no new inconsistent events found in the past two days.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-20 Thread dcausse
dcausse moved this task from In Progress to Needs review on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  The new approach seems to work.
  
  - Backfill period: `2020-11-06T23:00:01` -> `2020-11-20T13:40:00`
  - Dump reconciliation: `2020-11-06T23:00:01` -> `2020-11-12T03:12:51`
  
  The spurious events are almost all related to the dump reconciliation period 
(oldest to newest revision in the dumps):
  
++---+---+---+-+---+--+
|y   |m  |d  |inconsistency  |status   |event_type |count |
++---+---+---+-+---+--+
|2020|11 |6  |newer_revision_seen|CREATED  |revision-create|14998 |
|2020|11 |7  |newer_revision_seen|CREATED  |revision-create|380191|
|2020|11 |8  |newer_revision_seen|CREATED  |revision-create|435488|
|2020|11 |9  |newer_revision_seen|CREATED  |revision-create|310343|
|2020|11 |10 |newer_revision_seen|CREATED  |revision-create|180821|
|2020|11 |10 |newer_revision_seen|UNDEFINED|page-delete|2 |
|2020|11 |11 |newer_revision_seen|CREATED  |revision-create|136742|
|2020|11 |11 |newer_revision_seen|UNDEFINED|page-delete|18|
|2020|11 |12 |newer_revision_seen|CREATED  |revision-create|7234  |
|2020|11 |17 |newer_revision_seen|CREATED  |revision-create|1 |
++---+---+---+-+---+--+
  
  Note that at the time of exporting this data the pipeline had fully 
backfilled and was reading current events (2020-11-20 events).
  
  The kind of inconsistencies we see during the reconciliation period:
  
  - page-delete -> newer_revision_seen|UNDEFINED, means the item was deleted 
during that period but before being exported to the dump (rare).
  - revision-create -> newer_revision_seen|CREATED, means the revision create 
read was already exported in the dump (frequent)
  
  The inconsistency on `2020-11-17` is real but in line with our expectections 
to have a couple inconsistencies per day.
  Details are:
  

+--++--+---+---+--+
|item  |event_time  |revision  |parent_revision|status |rev 
  |

+--++--+---+---+--+
|Q102046169|2020-11-17T15:09:47Z|1308001440|1308001360 
|CREATED|1308001440|

+--++--+---+---+--+
  
  Which seems to indicate a duplicate event sent by changeprop (revision create 
for 1308001440 while 1308001440 is in the state). I'll let the pipeline run for 
the week-end.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-19 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-19 Thread gerritbot
gerritbot added a comment.


  Change 642077 had a related patch set uploaded (by DCausse; owner: DCausse):
  [wikidata/query/rdf@master] Unify re-ordering and decide mutation operation
  
  https://gerrit.wikimedia.org/r/642077

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-19 Thread dcausse
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-19 Thread dcausse
dcausse added a comment.


  After a test run it seems that we are able to backfill, unfortunately we skip 
a non negligible number of revision:
  
++---+---+---+---+---+---+-+
|y   |m  |d  |h  |inconsistency  |status |event_type |count|
++---+---+---+---+---+---+-+
|2020|11 |18 |15 |newer_revision_seen|CREATED|revision-create|190  |
|2020|11 |18 |16 |newer_revision_seen|CREATED|revision-create|85   |
|2020|11 |18 |17 |newer_revision_seen|CREATED|revision-create|406  |
|2020|11 |18 |18 |newer_revision_seen|CREATED|revision-create|584  |
|2020|11 |18 |19 |newer_revision_seen|CREATED|revision-create|333  |
|2020|11 |18 |20 |newer_revision_seen|CREATED|revision-create|361  |
|2020|11 |18 |21 |newer_revision_seen|CREATED|revision-create|86   |
|2020|11 |18 |22 |newer_revision_seen|CREATED|revision-create|278  |
|2020|11 |18 |23 |newer_revision_seen|CREATED|revision-create|63   |
|2020|11 |19 |0  |newer_revision_seen|CREATED|revision-create|110  |
|2020|11 |19 |1  |newer_revision_seen|CREATED|revision-create|42   |
|2020|11 |19 |2  |newer_revision_seen|CREATED|revision-create|48   |
|2020|11 |19 |3  |newer_revision_seen|CREATED|revision-create|18   |
|2020|11 |19 |4  |newer_revision_seen|CREATED|revision-create|13   |
|2020|11 |19 |5  |newer_revision_seen|CREATED|revision-create|94   |
|2020|11 |19 |6  |newer_revision_seen|CREATED|revision-create|27   |
|2020|11 |19 |7  |newer_revision_seen|CREATED|revision-create|148  |
|2020|11 |19 |8  |newer_revision_seen|CREATED|revision-create|34   |
++---+---+---+---+---+---+-+
  
  There are revision create events we receive but for which we received a newer 
revision (few cases I manually checked where unordered by 1 to 30 secs).
  For the wdqs use-case I think it's not a big deal to skip few revisions 
(diffing with rev < N-1) this might be confusing for other use cases we don't 
yet have.
  I'm a bit undecided here, one solution could be to delay revisions where we 
know that there's likely one in-between (current != rev_parent_id) but this 
moves the buffering logic to the operator which has the entity->rev state.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-18 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, Maintenance_bot
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, Alter-paule, 
Beast1978, Un1tY, Hook696, Kent7301, joker88john, CucyNoiD, Gaboe420, 
Giuliamocci, Cpaulf30, Af420, Bsandipan, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-18 Thread gerritbot
gerritbot added a comment.


  Change 640444 **merged** by jenkins-bot:
  [wikidata/query/rdf@master] Add support for partial re-ordering
  
  https://gerrit.wikimedia.org/r/640444

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-17 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-17 Thread gerritbot
gerritbot added a comment.


  Change 640444 had a related patch set uploaded (by DCausse; owner: DCausse):
  [wikidata/query/rdf@master] Add support for partial re-ordering
  
  https://gerrit.wikimedia.org/r/640444

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse, gerritbot
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T267029: The streaming-updater-producer should handle backfills gracefully

2020-11-17 Thread dcausse
dcausse renamed this task from "Tune the streaming-updater-producer to limit 
late events" to "The streaming-updater-producer should handle backfills 
gracefully".
dcausse updated the task description.

TASK DETAIL
  https://phabricator.wikimedia.org/T267029

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs