gerritbot added a comment.
Change 253934 had a related patch set uploaded (by Ori.livneh):
Make getLaggedSlaveMode() use reuseConnection() as needed
https://gerrit.wikimedia.org/r/253934
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
gerritbot added a comment.
Change 253934 merged by Ori.livneh:
Make getLaggedSlaveMode() use reuseConnection() as needed
https://gerrit.wikimedia.org/r/253934
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
daniel added a comment.
@mobrovac Yes, and I'll be very happy to replace the entire change dispatching
clutch with an actual event bus. I suspect it'll be a few months until we can
do that, though. I'm in touch with Gabriel about this, though I haven't looked
into the details. I did explain to
daniel added a comment.
In https://phabricator.wikimedia.org/T118162#1813400, @jcrespo wrote:
> Connection re-use does not work if you open 900 connections at the same time
> every 3 minutes. I have been saying that for a long time. If a DBA has to
> explain connection reuse vs pool of
jcrespo added a comment.
You are opening one connection per wiki. That is wrong. Locking per wiki will
serve nothing.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jcrespo
Cc: Tobi_WMDE_SW,
jcrespo added a comment.
| 206189024 | wikiadmin | 10.64.32.13:49847 | ruwikibooks |
Sleep |
| 206189066 | wikiadmin | 10.64.32.13:49848 | sahwiki |
Sleep |
| 206189112 | wikiadmin | 10.64.32.13:49850 | fiwikinews |
Sleep |
|
ori added a comment.
https://phabricator.wikimedia.org/tag/wikidata/ folks (especially @daniel and
@Lydia_Pintscher): Just a quick note to apologize for my conduct on this task.
I think I was quick to escalate things, and I did it in a manner that probably
only succeeded at raising the stress
Lydia_Pintscher added a comment.
@ori: Don't worry. We have it fixed and will do some more to make it better. I
understand it sucks when these things break so badly.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
jcrespo added a comment.
I am thinking of killing and banning these connections right now from s3
because it is breaking our database servers. It is still creating 1000-2000
idle connections to db1035.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
gerritbot added a comment.
Change 253889 had a related patch set uploaded (by Aude):
Close database connections in SqlChangeDispatchCoordinator
https://gerrit.wikimedia.org/r/253889
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
ori added a comment.
In https://phabricator.wikimedia.org/T118162#1795369, @jcrespo wrote:
> - The cron jobs run as wikiadmin, which are on purpose not limited in
> execution time, so there is no protection against them overflowing a database
> server
In
jcrespo added a comment.
> Now Jynus sais that he doesn't believe that fix is going to help our
> situation. I'm curious about why he believes that.
Connection re-use does not work if you open 900 connections at the same time
every 3 minutes. I have been saying that for a long time. If a DBA
aude added a comment.
from irc:
05:33 < aude> jynus: would it be worth to backport and deploy
https://gerrit.wikimedia.org/r/#/c/252267/ ?
05:34 < aude> would be really good to know that this helps or not (or how
much)
05:34 < jynus> I do not think that will work at all
TASK DETAIL
daniel added a comment.
@ori We do not have agreement on how to fix this. Dispatching via the job queue
would certainly be good, but it would change nothing regarding the number
databases the dispatch code needs to talk to. As far as I can see, it would do
nothing to fix the current problem.
gerritbot added a comment.
Change 253889 abandoned by Aude:
Close database connections in SqlChangeDispatchCoordinator
https://gerrit.wikimedia.org/r/253889
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
gerritbot added a comment.
Change 253898 had a related patch set uploaded (by Daniel Kinzler):
ChangeDispatcher should use locks on the local DB.
https://gerrit.wikimedia.org/r/253898
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
daniel added a comment.
Regarding https://gerrit.wikimedia.org/r/253898: this should fix any issues
with connections to client wiki dbs. However, it does so at the cost of
changing our locking mechanism, which means it cannot be deployed easily.
The original issue however is that //the
thiemowmde added a subscriber: thiemowmde.
thiemowmde added a comment.
The https://phabricator.wikimedia.org/tag/wikidata/ team will check if this is
resolved in https://phabricator.wikimedia.org/tag/wikidata-sprint-2015-11-17/.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL
daniel added a comment.
@jcrespo My point is that LoadBalancer::reuseConnection should make sure that I
get an existing connection, not a new one, when I ask for a connection to a
wiki on the same cluster, if there already is a connection to that cluster. If
used correctly, this means that the
gerritbot added a comment.
Change 252267 had a related patch set uploaded (by Aaron Schulz):
Make getLaggedSlaveMode() use reuseConnection() as needed
https://gerrit.wikimedia.org/r/252267
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
gerritbot added a comment.
Change 252267 merged by jenkins-bot:
Make getLaggedSlaveMode() use reuseConnection() as needed
https://gerrit.wikimedia.org/r/252267
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
gerritbot added a subscriber: gerritbot.
gerritbot added a comment.
Change 252179 had a related patch set uploaded (by Giuseppe Lavagetto):
maintenance: run at most one wikidata dispatchChanges instance at a time
https://gerrit.wikimedia.org/r/252179
TASK DETAIL
gerritbot added a comment.
Change 252179 merged by Giuseppe Lavagetto:
maintenance: run at most three wikidata dispatchChanges instance at a time
https://gerrit.wikimedia.org/r/252179
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
jcrespo added a comment.
> the connections should be pooled
We only pool connections at server side- there is not such a thing as client
connection pooling in mediawiki right now.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
aaron added a subscriber: aaron.
aaron added a comment.
See "+channel:DBPerformance +message:"*connections made*"" at
logstash.wikimedia.org. I see lots of concurrent connections from mw1152
scripts (if reuseConnection was called properly I'd assume there would be ~7 or
so, since going from DB
jcrespo added a comment.
To update the latest issues identified:
- As this creates one connection per wiki, it ends up opening 1800 connections
to s3 servers (actual measure)
- Most of this connections are idle, not doing actual work, which makes them
unnecesary, while creating the same issue
daniel added a comment.
In https://phabricator.wikimedia.org/T118162#1795441, @jcrespo wrote:
> We only pool connections at server side- there is not such a thing as client
> connection pooling in mediawiki right now.
I'm confused - what is it that `LoadBalancer::reuseConnection` does, then?
aaron added a comment.
I'll try to look into whether the problem is in core or not, though I need to
spend time on some other tasks too.
TASK DETAIL
https://phabricator.wikimedia.org/T118162
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: aaron
jcrespo added a comment.
We can go into details of terminology, but reusing connections != pool. There
are many differences, some of which require persistent connections. But the key
factor (specially in this case) is that in a pool of connections there is an
**upper limit** of maximum
29 matches
Mail list logo