dcausse created this task.
dcausse added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  As a maintainer of the wdqs streaming updater I want requests to 
Special:EntityData receiving a 404 response to be retried so that there are 
fewer items to reconcile (T279541 <https://phabricator.wikimedia.org/T279541>).
  
  There is a race between the events flowing to kafka and mysql replication. 
This race might cause the events to be processed before the data they point to 
is available on the mysql replica being reached.
  
  One simple to circumvent the problem would be to retry on 404. The retry 
could be guarded by a check on the difference processing time and the event 
time, if the difference is less than e.g. 10 seconds then a retry is performed.
  
  Looking at the side output data of the streaming updater for the first seven 
day of april we see (//range// is the delta between the ingestion time vs the 
processing time):
  
    +-------+------+
    |range  |events|
    +-------+------+
    |0: 0-1s|65    |
    |1: 1-3s|137   |
    |2: 3-5s|38    |
    |3: 5-7s|9     |
    +-------+------+
  
  which translates to: over the 8 days of wikidata edits 249 events failed with 
a 404 but for which the data is actually available (most probably due to 
replication lag) and whose events were ingested between 0 and 7 seconds after 
their event time.
  
  There are 141 events for which we received a 404 that is still a 404 now:
  
    +--------+------+
    |range   |events|
    +--------+------+
    |1: 1-3s |4     |
    |3: 5-7s |1     |
    |4: 7-10s|2     |
    |5: >10s |134   |
    +--------+-----+
  
  So retrying 404 for events with an `processing_time - event_time < 10 
seconds` seems the right threshold that will cause an extra latency only for a 
few hundreds of events per week.
  
  AC:
  
  - retry 404 until the event time is 10sec older than the processing time

TASK DETAIL
  https://phabricator.wikimedia.org/T279698

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, 
Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, 
QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to