Smalyshev created this task.
Smalyshev added projects: Wikidata-Query-Service, Traffic.
Herald added a subscriber: Aklapper.
Herald added projects: Operations, Wikidata, Discovery.

TASK DESCRIPTION

There are query sources for query service that produce a lot of very heavy queries very fast, bringing down the service. We could either drastically lower the limits (excluding legitimate users with longer queries or workloads that do not harm the service) or block that specific source until they fix their scripts.

Example:

141.2.108.223 repeatedly sends queries like:

SELECT ?cid ?article ?gnd WHERE { ?cid wdt:P227 ?gnd. ?article schema:about ?cid . filter( regex(str(?article), "en.wikipedia.org" )) } limit 1000 offset 185000

to the tune of tens of thousands per hour. E.g. on 2017-07-17, hour 16, there was 13970 such requests. Most of them are blocked by IP limits but unfortunately hundreds still pass through and harm the service, due to very long offset and large amount of data returned (may be also timing out and so not being cached).

This query obviously doesn't need to be repeated 10K+ times in a short period, so it looks like a broken bot. I'd like to block it and have tools to block such occurrences in the future. It doesn't happen a lot but when it does, it does damage to the whole service.


TASK DETAIL
https://phabricator.wikimedia.org/T170860

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: BBlack, Gehel, Aklapper, Smalyshev, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, Avner, Zppix, debt, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to