Imarlier added a comment.
@Gehel have the patches referenced above been deployed?TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Stashbot, Marostegui, Banyek, Reedy, gerritbot, Krinkle, Addshore,
gerritbot added a comment.
Change 471982 merged by jenkins-bot:
[wikidata/query/rdf@master] Instrument HTTP connection manager to expose JMX metrics.
https://gerrit.wikimedia.org/r/471982TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 471737 merged by jenkins-bot:
[wikidata/query/rdf@master] Evict idle connections from HTTP pool.
https://gerrit.wikimedia.org/r/471737TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 471982 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/rdf@master] Instrument HTTP connection manager to expose JMX metrics.
https://gerrit.wikimedia.org/r/471982TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 471737 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/rdf@master] Evict idle connections from HTTP pool.
https://gerrit.wikimedia.org/r/471737TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Smalyshev added a comment.
T202765 is about specific bot misbehavior, and was restricted because it contains samples of the logs and details about specific bot, which can contain PII. It is OK to see those for people under NDA, but it should not be public.TASK
Gehel added a comment.
For context, T202765 is about a bot sending annoying and somewhat expensive requests. That specific issue is now resolved.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc:
Ladsgroup added a comment.
It's so restricted that even with access to all security issues, I still don't have access to it.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: Stashbot, Marostegui,
Addshore added a comment.
*is also interested in what T202765 is*TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Stashbot, Marostegui, Banyek, Reedy, gerritbot, Krinkle, Addshore, Yurik, jcrespo,
Imarlier added a comment.
Can someone add me to T202765, mentioned above by @Gehel? The issue that @Smalyshev created last week seems to involve requests that are never getting to wikidata, which suggests a wdqs related issue of some sort (and thus potentially related to spikes in resource use or
gerritbot added a comment.
Change 460613 merged by jenkins-bot:
[mediawiki/core@master] debug: Allow the DBQuery channel to be used
https://gerrit.wikimedia.org/r/460613TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
eqiad db API hosts return the query in less than a second as they have the correct schema:
root@db1104.eqiad.wmnet[wikidatawiki]> select @@hostname;
++
| @@hostname |
++
| db1104 |
++
1 row in set (0.00 sec)
Marostegui added a comment.
300 errors in the last 24h, I think we are good?TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, MarosteguiCc: Stashbot, Marostegui, Banyek, Reedy, gerritbot, Krinkle,
Marostegui added a comment.
These are the last 24h: https://logstash.wikimedia.org/goto/cd0af28f39b7ad679b9d1e1130636fdf
Errors are almost gone now.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev,
Marostegui added a comment.
In T202764#4595562, @Smalyshev wrote:
Looking at logstash: https://logstash.wikimedia.org/goto/39a6fe9edd787798129b66ae9d61ed90 there's definitely a drop in timeouts, but there are still present, so I will monitor this further.
By looking at that graph but expanded
Smalyshev added a comment.
Looking at logstash: https://logstash.wikimedia.org/goto/39a6fe9edd787798129b66ae9d61ed90 there's definitely a drop in timeouts, but there are still present, so I will monitor this further.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
@Smalyshev have you noticed any improvements since the above comment was done, and the index is gone from everywhere but the recetchanges slaves (like in eqiad)?TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
@Smalyshev eqiad and codfw are not the same.
The index only exists on recentchanges replicas and the masters (you can ignore dbstore1002, it is not used in production).
root@neodymium:/home/marostegui# ./section s8 | while read host port ; do echo "$host:$port" ;
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T06:21:10Z] Drop tmp_2 and tmp_3 index from wikidatawiki.recentchanges on dbstore2001, db2079, db2082,db2083 - T202764TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
In T202764#4592236, @Smalyshev wrote:
The API requests for recentchanges now seem to be faster, but I still get exceptions in the log :( I also get a bunch of errors for Wikidata URLs like:
Smalyshev added a comment.
The API requests for recentchanges now seem to be faster, but I still get exceptions in the log :( I also get a bunch of errors for Wikidata URLs like: https://www.wikidata.org/wiki/Special:EntityData/Q33799921.ttl?nocache=1537250691109=dump
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:28:56Z] Synchronized wmf-config/db-codfw.php: Repool db2081 - T202764 (duration: 00m 49s)TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
@Smalyshev - the indexes have been removed from the API hosts.
The queries on those two servers (db2080 and db2081) now take around 0.05 sec to run. Can you check if this makes a difference from your end?.
Keep in mind that the indexes still exists on other hosts
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:22:45Z] Synchronized wmf-config/db-codfw.php: Depool db2081 - T202764 (duration: 00m 49s)TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:22:53Z] Drop tmp_2 and tmp_3 index from wikidatawiki.recentchanges on db2081 - T202764TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 461027 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2081
https://gerrit.wikimedia.org/r/461027TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:14:49Z] Synchronized wmf-config/db-codfw.php: Repool db2080 - T202764 (duration: 00m 49s)TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 461027 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2081
https://gerrit.wikimedia.org/r/461027TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:08:56Z] Drop tmp_2 and tmp_3 index from wikidatawiki.recentchanges on db2080 - T202764TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-09-18T05:08:25Z] Synchronized wmf-config/db-codfw.php: Depool db2080 - T202764 (duration: 00m 51s)TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 461025 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2080
https://gerrit.wikimedia.org/r/461025TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
gerritbot added a comment.
Change 461025 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2080
https://gerrit.wikimedia.org/r/461025TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Marostegui added a comment.
In T202764#4590949, @Smalyshev wrote:
I am a bit confused by now - is the original problem because recentchanges is using a wrong host, or it's using right host and the indexes there are wrong, or something else? And how can it be fixed? WDQS poller depends on RC API,
Smalyshev added a comment.
I am a bit confused by now - is the original problem because recentchanges is using a wrong host, or it's using right host and the indexes there are wrong, or something else? And how can it be fixed? WDQS poller depends on RC API, and having it take 30+ seconds instead
Marostegui added a comment.
In T202764#4589641, @jcrespo wrote:
The plan is for us dbas to test setting up a single API with the same structure than eqiad and do all assuming that fixies it, and later we will have to evaluate what is the right long-term status, given some unknowns and related
jcrespo added a comment.
The plan is for us dbas to test setting up a single API with the same structure than eqiad and do all assuming that fixies it, and later we will have to evaluate what is the right long-term status, given some unknowns and related tasks such as T202167:
One note: This
Marostegui added a comment.
In T202764#4587022, @Krinkle wrote:
@Marostegui So to confirm, recentchanges db hosts are the same within and between eqiad/codfw. But the api db hosts are different, right? Only api dbhosts in codfw have the two extra indexes. Is that right?
Correct, recentchanges
Krinkle added a comment.
@Marostegui So to confirm, recentchanges db hosts are the same within and between eqiad/codfw. But the api db hosts are different, right? Only api dbhosts in codfw have the two extra indexes. Is that right?
Also, are there differences between recentchanges db hosts and
Krinkle added a comment.
Earlier when I was investigating with Stas, performing the slow recentchanges api request with X-Wikimedia-Debug: log shows the following in Logstash:
[DEBUG] [DBConnection] Wikimedia\Rdbms\LoadBalancer::getReaderIndex: using server db2080 for group 'api'
Which indeed
Reedy added a comment.
In T202764#4585276, @Smalyshev wrote:
I tried db2085:3318 and the result the same as other codfw host. So if that's what actual API is using, that could be the reason why it is so slow.
It should be the rc hosts it's using for those queriesTASK
Smalyshev added a comment.
I tried db2085:3318 and the result the same as other codfw host. So if that's what actual API is using, that could be the reason why it is so slow.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Smalyshev added a comment.
@Reedy I am not sure which host, I just logged in to maintenance host for eqiad and codfw. Lookups show db2082.codfw.wmnet and db1092.eqiad.wmnet.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Smalyshev added a comment.
This query:
SELECT rc_id,rc_timestamp,rc_namespace,rc_title,rc_cur_id,rc_type,rc_deleted,rc_this_oldid,rc_last_oldid FROM `recentchanges`WHERE (rc_timestamp>='2018091411') AND rc_namespace IN ('0','120') AND rc_type IN ('0','1','3','6') ORDER BY
gerritbot added a comment.
Change 460613 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] debug: Allow the DBQuery channel to be used
https://gerrit.wikimedia.org/r/460613TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Smalyshev added a comment.
Doesn't seem to be WDQS related entirely - e.g. if I call 'https://www.wikidata.org/w/api.php?format=json=""> - i.e. try to load 100 items from start of the day today - it takes 29 seconds:
real 0m29.500s
user 0m0.023s
sys 0m0.011s
I don't think it's normal for this
Gehel added a comment.
The issue as seen from WDQS can be followed on logstash.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Krinkle, GehelCc: Krinkle, Addshore, Yurik, jcrespo, Imarlier, Ladsgroup,
Smalyshev added a comment.
Getting lots of these errors from Wikidata again today. Any ideas what could be causing this? This really messes up updates for the query service.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL
Smalyshev added a comment.
UsageAspectTransformer seems to be part of Wikibase client, so it shouldn't be involved in recentchanges API on wikidata.org.TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:
jcrespo added a comment.
@Ladsgroup That could be a separate issue and could be handled on a separate task (I don't know). The on topic here is UsageAspectTransformer.php (probably you saw that, just in case my comment was misleading about what was the main actionable here).TASK
jcrespo added a comment.
I got this on mediawiki logs at the same time than one of the retries. Are you sure you are not using a deprecated query? I don't know if they are correlated, but it happened in the same time frame and it is json related.
Warning: API call had warnings trying to get
Smalyshev added a comment.
@Imarlier yes, I suspect it might be Wikidata recentchanges API being slow, and I wonder if there's a way to check that (i.e. whether there are a lot of slow requests there and whether there are bumps in slow requests correlated with WDQS recentchanges load fails).TASK
Imarlier added a comment.
Hey, @Smalyshev -- Did you tag perf team on this because you're hoping that we can help with the investigation?TASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ImarlierCc: Imarlier,
Gehel added a comment.
Digging into this a bit more from the WDQS side, we see a few interesting things:
The NoHttpResponseException seems to not be a timeout client side, but an empty response (not even headers), with a state transition. It looks similar to what we would see if an intermediate
Smalyshev added a comment.
Logstash for throttled requestsTASK DETAILhttps://phabricator.wikimedia.org/T202764EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Aklapper, Gehel, Smalyshev, Imarlier, Lahi, Gq86, Lucas_Werkmeister_WMDE,
54 matches
Mail list logo