Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.
Hi Andra, Pywikibot should take care of that for you, see https://github.com/wikimedia/pywikibot/blob/master/pywikibot/comms/http.py#L317 Maarten On 02-11-19 11:36, Andra Waagmeester wrote: Hi, I hope this is the right mailing list to discuss this issue. Some time ago I ran into a series of temporary bans, I thought I managed to tackle this basically by doing a full stop once it gets any response header code other than 200. However, this seems not to have fixed it, since I received the following message: "requests.exceptions.HTTPError: 403 Client Error: You have been banned until 2019-10-18T10:21:36.495Z, please respect throttling and retry-after headers. for url: https://query.wikidata.org/sparql; I am looking into this from scratch and see if I can implement a better solution and certainly one that really respects the retry-after time instead of going full stop. Whatever I try now, I keep getting 200 headers and I don't want to start an excessive bot run to get into a ban state to see the exact header that the bot needs to respect. Is there an example of such a header which I can use to make my own test script? Or is there example python could that successfully deals with a retry-after header? Regards, Andra ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.
Hello! Sorry for the late reply. On Sat, Nov 2, 2019 at 12:31 PM Andra Waagmeester wrote: > Thanks for your prompt response. I wasn't filtering for 429, but only for > 503, so that might explain it. > This is my current countermeasure against overloading the system: > > > https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1179 > With only a quick look at the code, it looks good enough to me. A few things you might want to improve: * L1148 [1]: use a default retry_after of 60 seconds instead of 30. That's the upper bound of what our throttling will ask you * L1186-L1189: in case of 429, you can check the "retry-after" header to get a sleep value that will be what our throttling will expect > > > If you follow all that, you should be good. If you still see throttling / >> ban, let us know. If you give me the User-Agent of your script and the time >> at which you received the throttling / ban response, and I can have a look >> into the logs. >> >> > Where do I let you know? Is this email list the right place to do so? > This list is the right place. Or you can contact me directly if you want. But others might benefit from this discussion being public. [1] https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1148 [2] https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1186-L1189 > Regards, > > Andra > > ___ > Wikidata-tech mailing list > Wikidata-tech@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > -- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.
Thanks for your prompt response. I wasn't filtering for 429, but only for 503, so that might explain it. This is my current countermeasure against overloading the system: https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1179 If you follow all that, you should be good. If you still see throttling / > ban, let us know. If you give me the User-Agent of your script and the time > at which you received the throttling / ban response, and I can have a look > into the logs. > > Where do I let you know? Is this email list the right place to do so? Regards, Andra ___ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech