daniel added a comment.

In https://phabricator.wikimedia.org/T118162#1813400, @jcrespo wrote:

> Connection re-use does not work if you open 900 connections at the same time 
> every 3 minutes. I have been saying that for a long time. If a DBA has to 
> explain connection reuse vs pool of connections... :-/


There seems to be a fundamental misunderstanding here. The script is not 
designed to open 900 connections, nor should it be possible for this script to 
open 900 connections, if the code in core works as I understand it. I suspect 
that we are talking about "connection re-use" at a completely different level. 
I'm talking about re-use inside the same PHP invocation, not between requests. 
Do you know the relevant code in the LoadBalancer class? Can you tell me in 
what respect I am understanding or using it wrong? Maybe Aaron could shed some 
light on this.

> > We could just as well place the locks for all client wikis on the wikidata 
> > master db. Then there should be no reason to connect to the client database 
> > at all (assuming the job queue is not using mysql).

> 

> 

> Please don't. You are moving the problem from one server to another (and a 
> more critical server). A good start would be to close connections as soon as 
> they are not needed anymore- instead of idling them. Whatever you do, please 
> do not test it in production.


The script shouldn't idle the connection really, it's actually bounded by sql 
query speed on the repo's master DB. No queries are run against other 
databases. I do not see how we would "shift the problem" - the problem is too 
many connections. What I suggest is to use one connection for everything.

We were connecting to the client databases just to obtain a named lock on that 
database. The connection then stayed open (not so great), but (according to how 
I understand LoadBalancer), we would use the same connection for all wikis on 
the same cluster - so we may end up with 10 or so open connections per script. 
Not ideal, but not catastrophic. Unless the connection wasn't getting re-used, 
and we ended up creating a new one for every wiki. To me, that seems to be the 
problem - a problem caused by the bug in core that Aaron fixed.

We could start to forcibly close the connection after every query, even if it's 
just a second or less until we are going to fire the next one. But we'd have to 
discuss why we should do that in this case, and not in others. Explicitly 
closing connections is generally Not Done in MediaWiki - we keep the same 
connection(s) alive for the duration of the request. As far as I can see, 
DatabaseBase::close and LoadBalancer::closeConnection are never called in any 
regular maintenance script or during web requests.

If the "request" (script run) lasts 10 minutes, that can of course be 
problematic, especially if there are many such scripts, and the connection 
isn't actually used. But the connection to the repo database *is* used. And we 
don't have hundreds of script instances.

Closing the connection to slave DBs (instead of using the same connection for 
all wikis on the same master), as Aude suggests, would work around the issue of 
connection re-use not working properly. My patch also works around the problem, 
by using the connection to the repo wiki for everything (which requires a 
change to the locking logic). But with LoadBalancer working correctly, neither 
should be needed to avoid opening hundreds of connections. That's either a core 
bug (probably the one Aaron fixed), or a fundamental misunderstanding on my 
part (about how LoadBalancer is supposed to be doing).

There is a lot that can and should be improved about the dispatching process. 
But the //massive// problem we are seeing is not caused by the script working 
as designed. It's caused by the script misbehaving due to, as far as I can 
tell, a core bug. Which, I think, is fixed.


TASK DETAIL
  https://phabricator.wikimedia.org/T118162

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Tobi_WMDE_SW, ori, mobrovac, thiemowmde, aaron, jcrespo, gerritbot, daniel, 
aude, hoo, Lydia_Pintscher, Addshore, Aklapper, Joe, Wikidata-bugs, Mbch331, 
Krenair



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to