| aaron added a comment. |
As far as retries go, the attempts hash for wikidatawiki:htmlCacheUpdate has few entries with run counts no greater than 3. The onl incrementing code is doPop() in MediaWiki, the same code that made them go up to 3 to begin with. If the same job ran many times, I'd expect there to be very high values there.
> aaron@terbium:~$ mwscript eval.php wikidatawiki >
> error_reporting( E_ALL );
> require("/home/aaron/eval_job_check.php");
> foreach ( $wmfLocalServices['jobqueue_redis'] as $tag => $host ) { sanityCheckJQHost( $host, wfWikiId(), 'htmlCacheUpdate' ); }
array(6) {
["743f54ce7b8843d8b6e4ec081f633508"]=>
string(1) "3"
["ee20490772484aae905592ce6a4bc22c"]=>
string(1) "3"
["a45d1c46edc8450a90da89668cbe1924"]=>
string(1) "3"
["0083c49d9dec492d99ee7ea95ab25403"]=>
string(1) "3"
["b1f4cb9f1b9c4402b9f8da2348d6a46f"]=>
string(1) "3"
["2edd120f3b1a42edb3645d2dd777bf81"]=>
string(1) "3"
}
array(3) {
["65d41242504d4e4198b1213da1d3536c"]=>
string(1) "3"
["c2ceaffe86274a56b3b491899e3e3594"]=>
string(1) "3"
["f38d9c0116e7438b9c8d9a8ae6f9430e"]=>
string(1) "3"
}
array(3) {
["720afb9160b542b896820a8d069910c2"]=>
string(1) "3"
["3407d8dd224840c2bf79c36b55bc311a"]=>
string(1) "3"
["1f67fd5e59914a4686bee0877c4b935f"]=>
string(1) "3"
}
array(2) {
["9aa931c3f3444cc0bd9bfa8ff3097062"]=>
string(1) "3"
["a5bc6d9346f84a87ad4829edf096b977"]=>
string(1) "3"
}
array(1) {
["46677062e9e74d048541f1b8dab3c63a"]=>
string(1) "3"
}
array(3) {
["45b63ee504dd4f798956f6900079f452"]=>
string(1) "3"
["7992a032bebf45b9a686991dc29a24b4"]=>
string(1) "3"
["935f44b5c3d64dd392f29e8e8e94963b"]=>
string(1) "3"
}
array(3) {
["ef36284c42da45cfa667419c820d17c6"]=>
string(1) "3"
["6803b9c714b545a59d1830c5ab55ec60"]=>
string(1) "3"
["832aa8f83c1f475dabc56e256f22ea84"]=>
string(1) "3"
}
array(2) {
["dc12814a0b214c6d94f054aca4201115"]=>
string(1) "3"
["60fc4b1e6b354189982add7dfabccf25"]=>
string(1) "3"
}
array(4) {
["935610fad21c4d2eb8336cb594f57afb"]=>
string(1) "3"
["0a4bfacccd8d48258f8b7689b99f3180"]=>
string(1) "3"
["a381aed77ce94bec872aaebf8b96016b"]=>
string(1) "3"
["e8b9c16c9d3848c38ab0c44556a7d2e4"]=>
string(1) "3"
}
array(4) {
["6f4dd16a084d486dab52658a4ea54c37"]=>
string(1) "3"
["0e7f6a92e6eb4bb8a121047f869c3f6e"]=>
string(1) "3"
["fd18564d792f4d9f82f45a1e42c46973"]=>
string(1) "3"
["3b786e6a7dfd4f2fb4a9f924f160fcba"]=>
string(1) "3"
}
array(4) {
["0b13d238fa554706a08a5b2160a66e1e"]=>
string(1) "3"
["8c35814276f04985b7158081acfb8dbf"]=>
string(1) "3"
["1c3accd2123a4159aa7ee2e95628ad29"]=>
string(1) "3"
["d5a4d5bb391d4192ab1af5a9caee9f46"]=>
string(1) "3"
}
array(3) {
["ce5df11aaecc4bf9a641787c9bc41e9e"]=>
string(1) "3"
["16bd168b60e348bfab39e7a8921a99a1"]=>
string(1) "3"
["93ff062e5b00463e9efcda7604274112"]=>
string(1) "3"
}A page with 1 million backlinks would have (job divisions)x(leaf jobs per division) + (jobs that just divide into other jobs) = ( 1e6 / 300 * 3 + 1e6 / 300 = ~13334 job runs (if none failed), and they would all have the same rootJobTimestamp. The number of jobs with the same minute prefix would be higher (different rootJobSignature values though). The only thing odd about the table @GWicke posted is how old some root job descendants are.
Since job divisions go to the end of the queue (like any other job pushed), it will make it trickier to reason about timing. The oldest job in the queue might be to a page with a lot of backlinks. Each division puts the leaf and remnant (the one to divide) jobs at the end of the queue. The runners have to burn through the queue to get to the remnant job. This cycle repeats until it's done. When the queue has any serious length, this means it might take a long time to finish some old template backlink refresh/purge. During the increase, jobs kept piling up, meaning each iteration of old many-backlink job would take a long time to even get to the next division, stretching it out further than just a continuous one-off backlog of back-to-back jobs.
In any case, if there was an loop it would probably be in the job division itself. That code for that is largely in BacklinkJobUtils, which both htmlCacheUpdate and refreshLinks use.
Cc: GWicke, Nemo_bis, Andreasmperu, BBlack, Peachey88, Liuxinyu970226, daniel, Stashbot, Agabi10, Daniel_Mietchen, Harej, XXN, Pasleim, Bugreporter, Sjoerddebruin, Magnus, Mr.Ibrahem, Emijrp, gerritbot, EBernhardson, Esc3300, jcrespo, WMDE-leszek, Jdforrester-WMF, Krinkle, aaron, fgiunchedi, Aklapper, Ladsgroup, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, Vali.matei, Avner, Zppix, debt, Gehel, FloNight, Izno, Eevans, mobrovac, Hardikj, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, jeremyb
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
