| aaron added a comment. |
Ignored purges still count as work items, yes.
Rebound purges could explain some of the number. Also, given the backlog, lots of them probably had actually different rootJobTimestamps. MediaWiki can de-duplicate those when it's the same backlinked page X being edited several times by ignoring the older timestamp ones. It's trickier when templates A and B are edited and the backlinks overlap. Sometimes that gets caught, other times both purges to same page happen.
If htmlCacheUpdate queue was LIFO instead of FIFO, then the higher timestamp purges would run first more often and the lower ones would no-op given the SELECT query...that might be where the most de-duplication opportunities are missed. It mostly relies on non-parallel execution of jobs causing the range->root job division, and leaf job execution for different template/file edits to be intertwined. Whether the job with the higher rootJobTimestamp runs first or vise versa is luck based. When it's the former, then the purge is de-duplicated on the DB/CDN layer. Making that queue LIFO would nullify the rootJobSignature/timestamp deduplication however (e.g. several edits to template A).
I guess visually, the limitations on per-page deduplication is like:
Edit to A (t1): Queue: JobA, <prior jobs> [tail: left, head:right] Edit to B (t2): Queue: JobB, <prior jobs>, JobA, <prior jobs> As jobs run: Queue: JobAremnant,JobAleaf1, ..., JobAleaf500, <prior jobs>, <jobB>, <prior jobs> Queue: JobBremnant,JobBleaf1, ..., JobBleaf500, <prior jobs>, JobAremnant,JobAleaf1, ..., JobAleaf500, <prior jobs>
So the page A jobs from t1 run and *then* later the B jobs from t2. This tends to repeat as the remnant jobs divide up info leaf jobs. Any common pages in those leaf jobs will likely have page_touched hit twice (first t1 and then t2). The queue doesn't "know" that a later job will touch some of the same pages with a higher value, obviating the need for the first purges (aside from avoiding purge starvation in pathological cases).
Cc: Andreasmperu, BBlack, Peachey88, Liuxinyu970226, daniel, Stashbot, Agabi10, Daniel_Mietchen, Harej, XXN, Pasleim, Bugreporter, Sjoerddebruin, Magnus, Mr.Ibrahem, Emijrp, gerritbot, EBernhardson, Esc3300, jcrespo, WMDE-leszek, Jdforrester-WMF, Krinkle, aaron, fgiunchedi, Aklapper, Ladsgroup, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Hfbn0, Ramalepe, Liugev6, QZanden, EBjune, Vali.matei, Avner, Lewizho99, Zppix, Maathavan, debt, Gehel, FloNight, Izno, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, jeremyb
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
