[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2019-01-09 Thread Addshore
Addshore added a comment. So the spike reported again above was before https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseQualityConstraints/+/464812/ was merged and deployed. WBQC now respects the 429 header and will throttle based on it. I added a new panel to

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2019-01-09 Thread gerritbot
gerritbot added a comment. Change 483112 abandoned by Addshore: Split SPARQL pre and post throttling metrics Reason: This can probably be done with the metrics we already have.. https://gerrit.wikimedia.org/r/483112TASK DETAILhttps://phabricator.wikimedia.org/T204267EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2019-01-09 Thread gerritbot
gerritbot added a comment. Change 483112 had a related patch set uploaded (by Addshore; owner: Addshore): [mediawiki/extensions/WikibaseQualityConstraints@master] Split SPARQL pre and post throttling metrics https://gerrit.wikimedia.org/r/483112TASK

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2019-01-08 Thread Addshore
Addshore added a comment. In T204267#4652483, @Smalyshev wrote: Happened again, bumping the priority. The spike can be seen here https://grafana.wikimedia.org/d/00344/wikidata-quality?orgId=1=1539267066438=1539596899898 It doesn't look like there was a bit spike in hits to the api or

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-10-05 Thread Smalyshev
Smalyshev added a comment. Also, 1,182,961 events is a lot. What's going on there? Why so many? Is it a legit scenario? I wonder also if the most of it is regex matching shouldn't we make some service to do just that. Using full-blown SPARQL database to do regex matching is kinda like hammering in

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-10-05 Thread Smalyshev
Smalyshev added a comment. Is there a reason that all mediawiki hosts show as "localhost"? This is probably coming from Jetty, which takes it from connection info. Since we have nginx in front of the Blazegraph and no X-Client-IP is supplied, probably, it has no way to discover the originating

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-10-05 Thread Addshore
Addshore added a comment. It looks like there was another little flood on the 1st of October with requests being banned again: https://logstash.wikimedia.org/goto/77f3d01f6e7eaf56e5436727b5643ba2 Is there a reason that all mediawiki hosts show as "localhost"?TASK

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-17 Thread Addshore
Addshore added a comment. In T204267#4584397, @Smalyshev wrote: All bans are temporary, so as soon as traffic returns to normal the bans will expire. It would be nice if there was a way to wbqc to respect the 429 throttling header, which will avoid bans. That sounds like a great idea that could

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Smalyshev
Smalyshev added a comment. All bans are temporary, so as soon as traffic returns to normal the bans will expire. It would be nice if there was a way to wbqc to respect the 429 throttling header, which will avoid bans.TASK DETAILhttps://phabricator.wikimedia.org/T204267EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Tpt
Tpt added a comment. @Jonas Thank you for your feedback. is it necessary for your tool to run the constraint checks in parallel? No, I am going to switch to a sequential processing. Thanks for the idea! Using WDQS instead would be a good idea, because then only your tool would get throttled.

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Jonas
Jonas added a comment. @Tpt is it necessary for your tool to run the constraint checks in parallel? Using WDQS instead would be a good idea, because then only your tool would get throttled. The problem there is that just a fraction of all violations are in there at the moment.TASK

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Tpt
Tpt added a comment. Sorry everyone for the troubles. I was experimenting with a tool that tries to find corrections for constraint violations. I have modified it to send a proper User-Agent for all its requests to the Wikidata API but not restarted it. @Wikidata team: what would be an ok request

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-cloud) [2018-09-14T11:22:34Z] T204267 stop the corhist tool (k8s) because is hammering the wikidata APITASK DETAILhttps://phabricator.wikimedia.org/T204267EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To:

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread aborrero
aborrero added a comment. In T204267#4583629, @Pintoch wrote: @aborrero thanks for the ping. I do not recognize the shape of the queries as coming from this tool though. The openrefine-wikidata tool should do relatively few SPARQL queries, whose results are cached in redis. How did you determine

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Pintoch
Pintoch added a comment. @aborrero thanks for the ping. I do not recognize the shape of the queries as coming from this tool though. The openrefine-wikidata tool should do relatively few SPARQL queries, whose results are cached in redis. How did you determine that this tool is the source of the

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Stashbot
Stashbot added a comment. Mentioned in SAL (#wikimedia-cloud) [2018-09-14T10:51:35Z] T204267 stop the openrefine-wikidata tool (k8s) because is hammering the wikidata APITASK DETAILhttps://phabricator.wikimedia.org/T204267EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Addshore
Addshore added a comment. Looks like it is tools-worker-1021.tools.eqiad.wmflabsTASK DETAILhttps://phabricator.wikimedia.org/T204267EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Jonas, Addshore, TerraCodes, Liuxinyu970226, Aklapper, Smalyshev,

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-14 Thread Jonas
Jonas added a comment. Seems the #wikibase-quality-constraints extension is the source: https://grafana.wikimedia.org/dashboard/db/wikidata-quality?refresh=10s=1=now-30d=now

[Wikidata-bugs] [Maniphest] [Commented On] T204267: Flood of WDQS requests from wbqc

2018-09-13 Thread Smalyshev
Smalyshev added a comment. Kibana log for banned requests. Example request: PREFIX wd: PREFIX wds: PREFIX wdt: PREFIX wdv: PREFIX p: PREFIX ps: PREFIX pq: PREFIX pqv: PREFIX pr: PREFIX prv: PREFIX wikibase: PREFIX wikibase-beta: SELECT (REGEX("533892", "^(?:[1-9][0-9]+|)$") AS