daniel added a comment. @ori We do not have agreement on how to fix this. Dispatching via the job queue would certainly be good, but it would change nothing regarding the number databases the dispatch code needs to talk to. As far as I can see, it would do nothing to fix the current problem.
As far as I know, we are using core methods that are supposed to facilitate connection re-use. This was broken in core, and was fixed by Aaron. Now Jynus sais that he doesn't believe that fix is going to help our situation. I'm curious about why he believes that. It would also be good to get an answer to the question I asked in https://phabricator.wikimedia.org/T118162#1796490. Basically, I see three possibilities: 1) I am misunderstanding what the relevant core methods inLoadBalancer do 2) I'm using them wrong 3) they are broken. It would be very useful to get some feedback at least on the question whether SqlChangeDispatchCoordinator::engageClientLock() and SqlChangeDispatchCoordinator::releaseClientLock() are doing something obviously wrong. One possible fix to the connection issue is to change the locking mechanism. That would not be very hard to code, but annoying to deploy - we'd need to stop all dispatch scripts to make sure no old code runs in parallel with new code. This would also mean side-stepping the problem without understanding the cause. And I would really like to understand the cause. TASK DETAIL https://phabricator.wikimedia.org/T118162 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Tobi_WMDE_SW, ori, mobrovac, thiemowmde, aaron, jcrespo, gerritbot, daniel, aude, hoo, Lydia_Pintscher, Addshore, Aklapper, Joe, Wikidata-bugs, Mbch331, Krenair _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
