daniel added a comment.

@ori We do not have agreement on how to fix this. Dispatching via the job queue 
would certainly be good, but it would change nothing regarding the number 
databases the dispatch code needs to talk to. As far as I can see, it would do 
nothing to fix the current problem.

As far as I know, we are using core methods that are supposed to facilitate 
connection re-use. This was broken in core, and was fixed by Aaron. Now Jynus 
sais that he doesn't believe that fix is going to help our situation. I'm 
curious about why he believes that.

It would also be good to get an answer to the question I asked in 
https://phabricator.wikimedia.org/T118162#1796490. Basically, I see three 
possibilities: 1) I am misunderstanding what the relevant core methods 
inLoadBalancer do 2) I'm using them wrong 3) they are broken. It would be very 
useful to get some feedback at least on the question whether  
SqlChangeDispatchCoordinator::engageClientLock() and 
SqlChangeDispatchCoordinator::releaseClientLock() are doing something obviously 
wrong.

One possible fix to the connection issue is to change the locking mechanism. 
That would not be very hard to code, but annoying to deploy - we'd need to stop 
all dispatch scripts to make sure no old code runs in parallel with new code. 
This would also mean side-stepping the problem without understanding the cause. 
And I would really like to understand the cause.


TASK DETAIL
  https://phabricator.wikimedia.org/T118162

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Tobi_WMDE_SW, ori, mobrovac, thiemowmde, aaron, jcrespo, gerritbot, daniel, 
aude, hoo, Lydia_Pintscher, Addshore, Aklapper, Joe, Wikidata-bugs, Mbch331, 
Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to