alaa_wmde added a comment.
> - It's a SPOF, if mwmaint1002 node goes down for HW issues, we can't dispatch at all. If there's a need to restart the node, dispatching has to stop until it's done. > - "Noisy neighbor" effect, people run maintenance scripts in the mwmaint node, it can be choked to death by other scripts and it can make running maintenance scripts impossible by having bugs that eats all of the resources. > - The distributed system we designed for this (pulling the wikis using three cronjobs, dispatching and picking up basically random + most stalled ones). This can use the great infrastructure for jobqueues we have. > - Cronjobs are hard to debug, moving them to jobqueue makes it easier to debug in logstash. This sounds like a nice idea.. In order to be able to order this in proper priority, do we have any measurements/hypothesis on what we would solve/gain with this change? or is it just anticipating future risks and trying to proactively do something? TASK DETAIL https://phabricator.wikimedia.org/T48643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: alaa_wmde Cc: alaa_wmde, Ladsgroup, Joe, Addshore, Aklapper, Tobi_WMDE_SW, JanZerebecki, Wikidata-bugs, Abraham, Nemo_bis, Denny, aude, Ricordisamoa, Lydia_Pintscher, daniel, hoo, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs