alaa_wmde added a comment.

  > - It's a SPOF, if mwmaint1002 node goes down for HW issues, we can't 
dispatch at all. If there's a need to restart the node, dispatching has to stop 
until it's done.
  > - "Noisy neighbor" effect, people run maintenance scripts in the mwmaint 
node, it can be choked to death by other scripts and it can make running 
maintenance scripts impossible by having bugs that eats all of the resources.
  > - The distributed system we designed for this (pulling the wikis using 
three cronjobs, dispatching and picking up basically random + most stalled 
ones). This can use the great infrastructure for jobqueues we have.
  > - Cronjobs are hard to debug, moving them to jobqueue makes it easier to 
debug in logstash.
  
  This sounds like a nice idea..
  
  In order to be able to order this in proper priority, do we have any 
measurements/hypothesis on what we would solve/gain with this change? or is it 
just anticipating future risks and trying to proactively do something?

TASK DETAIL
  https://phabricator.wikimedia.org/T48643

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: alaa_wmde
Cc: alaa_wmde, Ladsgroup, Joe, Addshore, Aklapper, Tobi_WMDE_SW, JanZerebecki, 
Wikidata-bugs, Abraham, Nemo_bis, Denny, aude, Ricordisamoa, Lydia_Pintscher, 
daniel, hoo, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to