Re: [Wikidata] Overload of query.wikidata.org (Guillaume Lederrey)

2019-06-21 Thread Imre Samu
good news :)
-  the issue has been fixed in the 0.6.7 release - and it is working again!
https://github.com/nichtich/wikidata-taxonomy/commit/97abd4158b3c4ba9cd2c53503ca6b8b2ca29bc2a

Imre



Stas Malyshev  ezt írta (időpont: 2019. jún. 19.,
Sze, 0:22):

> Hi!
>
> On 6/18/19 2:29 PM, Tim Finin wrote:
> > I've been using wdtaxonomy
> >  happily for many months
> > on my macbook. Starting yesterday, every call I make (e.g., "wdtaxonomy
> > -c Q5") produces an immediate "SPARQL request failed" message.
>
> Could you provide more details, which query is sent and what is the full
> response (including HTTP code)?
>
> >
> > Might these requests be blocked now because of the new WDQS policies?
>
> One thing I may think of it that this tool does not send the proper
> User-Agent header. According to
> https://meta.wikimedia.org/wiki/User-Agent_policy, all clients should
> identify with valid user agent. We've started enforcing it recently, so
> maybe this tool has this issue. If not, please provide the data above.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org (Guillaume Lederrey)

2019-06-19 Thread Tim Finin
I've been using wdtaxonomy
 happily
for many months on my macbook running 10.14.5. Starting yesterday, every
call I make (e.g., "wdtaxonomy -c Q5") produces an immediate "SPARQL
request failed" message.

Might these requests be blocked now because of the new WDQS policies?

Tim
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org (Guillaume Lederrey)

2019-06-18 Thread Stas Malyshev
Hi!

On 6/18/19 2:29 PM, Tim Finin wrote:
> I've been using wdtaxonomy
>  happily for many months
> on my macbook. Starting yesterday, every call I make (e.g., "wdtaxonomy
> -c Q5") produces an immediate "SPARQL request failed" message.

Could you provide more details, which query is sent and what is the full
response (including HTTP code)?

> 
> Might these requests be blocked now because of the new WDQS policies?

One thing I may think of it that this tool does not send the proper
User-Agent header. According to
https://meta.wikimedia.org/wiki/User-Agent_policy, all clients should
identify with valid user agent. We've started enforcing it recently, so
maybe this tool has this issue. If not, please provide the data above.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org (Guillaume Lederrey)

2019-06-18 Thread Tim Finin
I've been using wdtaxonomy
 happily
for many months on my macbook. Starting yesterday, every call I make (e.g.,
"wdtaxonomy -c Q5") produces an immediate "SPARQL request failed" message.

Might these requests be blocked now because of the new WDQS policies?

Tim
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org

2019-06-17 Thread Gerard Meijssen
Hoi,
I make use of the SourceMD environment, it is well behaved allows for
throttling and when I have multiple jobs it only runs one at a time. I do
understand that my jobs are put on hold when the situation warrants it, I
even put them myself on hold when I think about it.

When someone else puts my job on hold, I cannot release them at a better
time and I now have seven jobs doing nothing. A new job progresses
normally. The point is that management is ok but given that what I do is
well behaved, I expect my jobs to run and when held to be released at a
later time. When I cannot depend on jobs to finish, my work is not finished
and I do not know if I should run more jobs and what jobs to get the data
to a finished state.
Thanks,
GerardM

On Tue, 18 Jun 2019 at 06:35, Stas Malyshev  wrote:

> Hi!
>
> > We are currently dealing with a bot overloading the Wikidata Query
> > Service. This bot does not look actively malicious, but does create
> > enough load to disrupt the service. As a stop gap measure, we had to
> > deny access to all bots using python-request user agent.
> >
> > As a reminder, any bot should use a user agent that allows to identify
> > it [1]. If you have trouble accessing WDQS, please check that you are
> > following those guidelines.
>
> To add to this, we have had this trouble because two events that WDQS
> currently does not deal well with have coincided:
>
> 1. An edit bot that edited with 200+ edits per minute. This is too much.
> Over 60/m is really almost always too much. And also it would be a good
> thing to consider if your bots does multiple changes (e.g. adds multiple
> statements) doing it in one call instead of several, since WDQS
> currently will do an update on each change separately, and this may be
> expensive. We're looking into various improvements to this, but it is
> the state currently.
>
> 2. Several bots have been flooding the service query endpoint with
> requests. There is recently a growth in bots that a) completely ignore
> both regular limits and throttling hints b) do not have proper
> identifying user agent and c) use distributed hosts so our throttling
> system has a problem to deal with them automatically. We intend to crack
> down more and more on such clients, because they look a lot like DDOS
> and ruin the service experience for everyone.
>
> I will write down more detailed rules probably a bit later, but so far
> these:
>
> https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints
> and additionally having distinct User-Agent if you're running a bot is a
> good idea.
>
> And for people who are thinking it's a good idea to launch a
> max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon
> machines so that throttling has hard time detecting it, and then when
> throttling does detect it neglecting to check for a week that all the
> bot is doing is fetching 403s from the service and wasting everybody's
> time - please think again. If you want to do something non-trivial
> querying WDQS and limits get in the way - please talk to us (and if you
> know somebody who isn't reading this list but is considering wiring a
> bot interfacing with WDQS - please educate them and refer them for help,
> we really prefer to help than to ban). Otherwise, we'd be forced to put
> more limitations on it that will affect everyone.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org

2019-06-17 Thread Stas Malyshev
Hi!

> We are currently dealing with a bot overloading the Wikidata Query
> Service. This bot does not look actively malicious, but does create
> enough load to disrupt the service. As a stop gap measure, we had to
> deny access to all bots using python-request user agent.
> 
> As a reminder, any bot should use a user agent that allows to identify
> it [1]. If you have trouble accessing WDQS, please check that you are
> following those guidelines.

To add to this, we have had this trouble because two events that WDQS
currently does not deal well with have coincided:

1. An edit bot that edited with 200+ edits per minute. This is too much.
Over 60/m is really almost always too much. And also it would be a good
thing to consider if your bots does multiple changes (e.g. adds multiple
statements) doing it in one call instead of several, since WDQS
currently will do an update on each change separately, and this may be
expensive. We're looking into various improvements to this, but it is
the state currently.

2. Several bots have been flooding the service query endpoint with
requests. There is recently a growth in bots that a) completely ignore
both regular limits and throttling hints b) do not have proper
identifying user agent and c) use distributed hosts so our throttling
system has a problem to deal with them automatically. We intend to crack
down more and more on such clients, because they look a lot like DDOS
and ruin the service experience for everyone.

I will write down more detailed rules probably a bit later, but so far
these:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_constraints
and additionally having distinct User-Agent if you're running a bot is a
good idea.

And for people who are thinking it's a good idea to launch a
max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon
machines so that throttling has hard time detecting it, and then when
throttling does detect it neglecting to check for a week that all the
bot is doing is fetching 403s from the service and wasting everybody's
time - please think again. If you want to do something non-trivial
querying WDQS and limits get in the way - please talk to us (and if you
know somebody who isn't reading this list but is considering wiring a
bot interfacing with WDQS - please educate them and refer them for help,
we really prefer to help than to ban). Otherwise, we'd be forced to put
more limitations on it that will affect everyone.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org

2019-06-17 Thread Guillaume Lederrey
No, there isn't any prioritization. Updates are guaranteed as they stay in
the update queue if they could not be written, but both read and writes are
impacted by resource saturation.

On Mon, 17 Jun 2019, 15:35 Gerard Meijssen, 
wrote:

> Hoi,
> Does this mean that the retrieval of data has priority over updates ?
> Thanks,
>   GerardM
> On Mon, 17 Jun 2019 at 14:52, Guillaume Lederrey 
> wrote:
>
>> Hello all!
>>
>> We now have an incident report [1] describing in more detail this
>> overload of Wikidata Query Service. The ban of python-request is still
>> in effect and will remain so until we have a throttling solution in
>> place for generic user agents.
>>
>> Thanks all for your patience!
>>
>>Guillaume
>>
>>
>> [1]
>> https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs
>>
>> On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey
>>  wrote:
>> >
>> > Hello all!
>> >
>> > We are currently dealing with a bot overloading the Wikidata Query
>> > Service. This bot does not look actively malicious, but does create
>> > enough load to disrupt the service. As a stop gap measure, we had to
>> > deny access to all bots using python-request user agent.
>> >
>> > As a reminder, any bot should use a user agent that allows to identify
>> > it [1]. If you have trouble accessing WDQS, please check that you are
>> > following those guidelines.
>> >
>> > More information and a proper incident report will be communicated as
>> > soon as we are on top of things again.
>> >
>> > Thanks for your understanding!
>> >
>> >Guillaume
>> >
>> >
>> > [1] https://meta.wikimedia.org/wiki/User-Agent_policy
>> >
>> > --
>> > Guillaume Lederrey
>> > Engineering Manager, Search Platform
>> > Wikimedia Foundation
>> > UTC+2 / CEST
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Engineering Manager, Search Platform
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org

2019-06-17 Thread Gerard Meijssen
Hoi,
Does this mean that the retrieval of data has priority over updates ?
Thanks,
  GerardM

On Mon, 17 Jun 2019 at 14:52, Guillaume Lederrey 
wrote:

> Hello all!
>
> We now have an incident report [1] describing in more detail this
> overload of Wikidata Query Service. The ban of python-request is still
> in effect and will remain so until we have a throttling solution in
> place for generic user agents.
>
> Thanks all for your patience!
>
>Guillaume
>
>
> [1]
> https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs
>
> On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey
>  wrote:
> >
> > Hello all!
> >
> > We are currently dealing with a bot overloading the Wikidata Query
> > Service. This bot does not look actively malicious, but does create
> > enough load to disrupt the service. As a stop gap measure, we had to
> > deny access to all bots using python-request user agent.
> >
> > As a reminder, any bot should use a user agent that allows to identify
> > it [1]. If you have trouble accessing WDQS, please check that you are
> > following those guidelines.
> >
> > More information and a proper incident report will be communicated as
> > soon as we are on top of things again.
> >
> > Thanks for your understanding!
> >
> >Guillaume
> >
> >
> > [1] https://meta.wikimedia.org/wiki/User-Agent_policy
> >
> > --
> > Guillaume Lederrey
> > Engineering Manager, Search Platform
> > Wikimedia Foundation
> > UTC+2 / CEST
>
>
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org

2019-06-17 Thread Guillaume Lederrey
Hello all!

We now have an incident report [1] describing in more detail this
overload of Wikidata Query Service. The ban of python-request is still
in effect and will remain so until we have a throttling solution in
place for generic user agents.

Thanks all for your patience!

   Guillaume


[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs

On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey
 wrote:
>
> Hello all!
>
> We are currently dealing with a bot overloading the Wikidata Query
> Service. This bot does not look actively malicious, but does create
> enough load to disrupt the service. As a stop gap measure, we had to
> deny access to all bots using python-request user agent.
>
> As a reminder, any bot should use a user agent that allows to identify
> it [1]. If you have trouble accessing WDQS, please check that you are
> following those guidelines.
>
> More information and a proper incident report will be communicated as
> soon as we are on top of things again.
>
> Thanks for your understanding!
>
>Guillaume
>
>
> [1] https://meta.wikimedia.org/wiki/User-Agent_policy
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST



-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Overload of query.wikidata.org

2019-06-13 Thread Guillaume Lederrey
Hello all!

We are currently dealing with a bot overloading the Wikidata Query
Service. This bot does not look actively malicious, but does create
enough load to disrupt the service. As a stop gap measure, we had to
deny access to all bots using python-request user agent.

As a reminder, any bot should use a user agent that allows to identify
it [1]. If you have trouble accessing WDQS, please check that you are
following those guidelines.

More information and a proper incident report will be communicated as
soon as we are on top of things again.

Thanks for your understanding!

   Guillaume


[1] https://meta.wikimedia.org/wiki/User-Agent_policy

-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata