dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata.
TASK DESCRIPTION As a maintainer of the wdqs streaming updater I want requests to Special:EntityData receiving a 404 response to be retried so that there are fewer items to reconcile (T279541 <https://phabricator.wikimedia.org/T279541>). There is a race between the events flowing to kafka and mysql replication. This race might cause the events to be processed before the data they point to is available on the mysql replica being reached. One simple to circumvent the problem would be to retry on 404. The retry could be guarded by a check on the difference processing time and the event time, if the difference is less than e.g. 10 seconds then a retry is performed. Looking at the side output data of the streaming updater for the first seven day of april we see (//range// is the delta between the ingestion time vs the processing time): +-------+------+ |range |events| +-------+------+ |0: 0-1s|65 | |1: 1-3s|137 | |2: 3-5s|38 | |3: 5-7s|9 | +-------+------+ which translates to: over the 8 days of wikidata edits 249 events failed with a 404 but for which the data is actually available (most probably due to replication lag) and whose events were ingested between 0 and 7 seconds after their event time. There are 141 events for which we received a 404 that is still a 404 now: +--------+------+ |range |events| +--------+------+ |1: 1-3s |4 | |3: 5-7s |1 | |4: 7-10s|2 | |5: >10s |134 | +--------+-----+ So retrying 404 for events with an `processing_time - event_time < 10 seconds` seems the right threshold that will cause an extra latency only for a few hundreds of events per week. AC: - retry 404 until the event time is 10sec older than the processing time TASK DETAIL https://phabricator.wikimedia.org/T279698 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
