EBernhardson added a comment.

Unfortunately we've had this problem several times before, and it can be quite hard to distinguish between how some jobs behave when their backlog increases and which jobs are actually causing problems.

Looking at the graphs of total queue size it seems the important place to look at is what between perhaps 8/8 and 8/15 when job queue growth started growing and hasn't since declined.

Things to look at:

  • Jobs all record a token that indicates what web request issued the job. This token propogates, so even if there are multiple levels of jobs their tokens will all be the same. These tokens are used when logging anything to logstash or udp2log about the job. If a large number of jobs all have the same token that is a hint of where to look.
  • Increased activity in certain jobs might be useful, but it depends. Some job types will naturally put more pressure on the queue as the queue builds up to unexpected levels.
  • ???

Actions taken before:

  • Spin up more job queue workers. In the past this has worked to clear out the backlog, but it depends on what the backlog contains. For example jobs can be kicked back into the queue if database slave lag is too high for them to run, in which case more workers wont do anything. We also had a problem once where a job that recursively issued more jobs got stuck in a loop, and running more jobs just created more jobs.
  • Drop all the jobs. Very drastic, not desirable. But it's been done multiple times. After finding the bugged job that recursively issued more jobs we ended up dropping all instances of that job from the queues.

TASK DETAIL
https://phabricator.wikimedia.org/T173710

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: EBernhardson
Cc: EBernhardson, Esc3300, jcrespo, WMDE-leszek, Jdforrester-WMF, Krinkle, aaron, fgiunchedi, Aklapper, Ladsgroup, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, Vali.matei, Avner, Zppix, debt, Gehel, FloNight, Izno, Wikidata-bugs, aude, jayvdb, faidon, Mbch331, Jay8g, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to