Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.

2019-11-17 Thread Maarten Dammers

Hi Andra,

Pywikibot should take care of that for you, see 
https://github.com/wikimedia/pywikibot/blob/master/pywikibot/comms/http.py#L317 



Maarten

On 02-11-19 11:36, Andra Waagmeester wrote:

Hi,

    I hope this is the right mailing list to discuss this issue.
Some time ago I ran into a series of temporary bans, I thought I 
managed to tackle this basically by doing a full stop once it gets any 
response header code other than 200.


However, this seems not to have fixed it, since I received the 
following message:


"requests.exceptions.HTTPError: 403 Client Error: You have been banned 
until 2019-10-18T10:21:36.495Z, please respect throttling and 
retry-after headers. for url: https://query.wikidata.org/sparql;


I am looking into this from scratch and see if I can implement a 
better solution and certainly one that really respects the retry-after 
time instead of going full stop.


Whatever I try now, I keep getting 200 headers and I don't want to 
start an excessive bot run to get into a ban state to see the exact 
header that the bot needs to respect.


Is there an example of such a header which I can use to make my own 
test script?


Or is there example python could that successfully deals with a 
retry-after header?


Regards,

Andra



___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.

2019-11-06 Thread Guillaume Lederrey
Hello!

Sorry for the late reply.

On Sat, Nov 2, 2019 at 12:31 PM Andra Waagmeester  wrote:

> Thanks for your prompt response. I wasn't filtering for 429, but only for
> 503, so that might explain it.
> This is my current countermeasure against overloading the system:
>
>
> https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1179
>

With only a quick look at the code, it looks good enough to me. A few
things you might want to improve:

* L1148 [1]: use a default retry_after of 60 seconds instead of 30. That's
the upper bound of what our throttling will ask you
* L1186-L1189: in case of 429, you can check the "retry-after" header to
get a sleep value that will be what our throttling will expect



>
>
> If you follow all that, you should be good. If you still see throttling /
>> ban, let us know. If you give me the User-Agent of your script and the time
>> at which you received the throttling / ban response, and I can have a look
>> into the logs.
>>
>>
> Where do I let you know? Is this email list the right place to do so?
>

This list is the right place. Or you can contact me directly if you want.
But others might benefit from this discussion being public.

[1]
https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1148
[2]
https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1186-L1189



> Regards,
>
> Andra
>
> ___
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>


-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata-tech] How to respect throttling and retry-after headers on the Wikidata Query Service.

2019-11-02 Thread Andra Waagmeester
Thanks for your prompt response. I wasn't filtering for 429, but only for
503, so that might explain it.
This is my current countermeasure against overloading the system:

https://github.com/SuLab/WikidataIntegrator/blob/v0.4.3/wikidataintegrator/wdi_core.py#L1179



If you follow all that, you should be good. If you still see throttling /
> ban, let us know. If you give me the User-Agent of your script and the time
> at which you received the throttling / ban response, and I can have a look
> into the logs.
>
>
Where do I let you know? Is this email list the right place to do so?

Regards,

Andra
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech