Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-19 Thread Thad Guidry
No connections to Opencorporates, sorry.

The good news is that the data sources in Opencorporates (the Registers)
are accessible to you...sometimes in dump format.

https://opencorporates.com/registers

Hope that helps you further in your research and needs.  I am not saying
its easy :)

-Thad
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-19 Thread Luigi Assom
Hi Thad,

It is a really great project, I quote some of the points of Sebastian:

>* # regarding Opencorporates *>* I have a critical opinion with
> Opencorporates. It appears to be *>* open, but you actually can not get
> the data. If somebody has a *>* data dump, please forward to me. Thanks. *
> >* More on top, I consider Opencorporates a danger to open data. It *>*
> appears to push open availability of data, but then it is limited *>* to
> open licenses. Usefulness is limited as there are no free dumps *>* and
> no possibility to duplicate it effectlively. Wikipedia and *>* Wikidata
> provide dumps and an API for exactly this reason. *>* Everytime somebody
> wants to create an open organisation dataset *>* with no barriers, the
> existence of Opencorporates is blocking this.*


I think that having the possibility to make an analysis on bulk is
important.

Some data in opencorporates are incomplete - like founders, capital raised,
investors, despite some info is fed from users.
Currently most data is about US and NZ, Id like t see EU more represented.

I would like to have possibility to visualise a network of companies and
their participations.
And build bypartite graphs between personas and companies.
I will try to reach them, about cooperation for such a project.

Do you have connections with them?




On Thu, Oct 19, 2017 at 2:17 PM, Thad Guidry  wrote:

> Hi Luigi,
>
> Have you looked at https://opencorporates.com  ?
>
> Thad
> +ThadGuidry 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Turning Lists to Wikidata

2017-10-19 Thread Antonin Delpeuch (lists)
On 19/10/2017 12:28, Thad Guidry wrote:
> It worked fantastic with Freebase and I do not see any reason why it
> couldn't be done for Wikidata and simplify the absorption of lists into
> Wikidata.
> 
> Antonin, was it in your plans to eventually work on the schema alignment
> dialog also for uploading data back to Wikidata to complete the circle
> of life, "take and give" ?

Yes - a good part of that is implemented, but there is still some work
to do before a release.

Antonin

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Yaroslav Blanter
Thanks Gillaume for clarification.

Cheers
Yaroslav

On Thu, Oct 19, 2017 at 3:06 PM, Guillaume Lederrey  wrote:

> Hello!
>
> As far as I understand, the dispatch lag is an issue between Wikidata
> and the different Wikipedias. There is no involvement of Wikidata
> Query Service in this. Sjoerd probably understand that much better
> than I do...
>
> Note that this issue also caused some replication lag on one of the
> Wikidata Query Service servers [1]. In that case, this was mitigated
> by taking that specific server out of rotation and wait for it to
> recover before sending traffic to it again. And also note that the
> Wikidata Query Service replication lag is a very different kind of lag
> than the dispatch lag you were talking about. (yes, all this is
> complicated).
>
> Thanks for your interest!
>
> [1] https://grafana.wikimedia.org/dashboard/db/wikidata-query-
> service?refresh=1m&orgId=1&from=now-7d&to=now
>
> On Thu, Oct 19, 2017 at 2:29 PM, Yaroslav Blanter 
> wrote:
> > Thanks Sjoerd. Some en-wiki users consider the delay as a (one more)
> > argument that Wikidata is junk and should be thrown down the toilet, so I
> > was curious whether the delay was handled as a part of the problem.
> >
> > Cheers
> > Yaroslav
> >
> > On Thu, Oct 19, 2017 at 12:09 PM, Sjoerd de Bruin 
> > wrote:
> >>
> >> Hi Yaoslav,
> >>
> >> No, but there has been some dispatch issues in the last few days. The
> >> current lag for enwiki is 3 hours, for example. You can see a graph of
> the
> >> dispatch lag here:
> >> https://grafana.wikimedia.org/dashboard/db/wikidata-
> dispatch?refresh=1m&orgId=1&from=now-7d&to=now
> >>
> >> Greetings,
> >>
> >> Sjoerd de Bruin
> >> sjoerddebr...@me.com
> >>
> >> Op 19 okt. 2017, om 11:30 heeft Yaroslav Blanter  het
> >> volgende geschreven:
> >>
> >> Thanks Guilaume,
> >>
> >> is this the same accident which caused an hour delay of Wikidata items
> on
> >> Wikipedia watchlists?
> >>
> >> Cheers
> >> Yaroslav
> >>
> >> On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey
> >>  wrote:
> >>>
> >>> Hello all!
> >>>
> >>> As you might have seen / endured, we've had a Wikdiata Query Service
> >>> partial outage yesterday morning (central european time). The full
> >>> incident report is available [1] if you are interested in the details.
> >>> The short version:
> >>>
> >>> * a single client started to run an unusually high number of queries on
> >>> WDQS
> >>> * the overload was not prevented by our current throttling
> >>> * the failure was not detected and isolated automatically
> >>>
> >>> To prevent this from happening again, we will review our throttling
> >>> rules. Those rules were previously tuned to prevent a single client
> >>> from overloading the service with a small number of expensive
> >>> requests: we started to log a client activity only when the duration
> >>> of a request exceeded 10 seconds. Which means that a client sending
> >>> tons of short requests would never be throttled.
> >>>
> >>> We will correct that by lowering the threshold to probably 25ms. The
> >>> throttling rules are still the same:
> >>>
> >>> * 60 seconds of processing time per minute (peaking at 120 seconds)
> >>> * 30 errors per minute (peaking at 60)
> >>>
> >>> If you are using WDQS to make lots of small requests, and you are over
> >>> the throttling rates above, there is a chance that you will start
> >>> seeing throttling errors. We are not doing this to bother you, we're
> >>> just trying to keep another crash from happening...
> >>>
> >>> If you are throttled, you will receive an HTTP 429 error code. This
> >>> response include the "Retry-After" HTTP header which specify a number
> >>> of seconds you should wait before retrying.
> >>>
> >>> Thanks for your patience!
> >>>
> >>> And contact me if you want any clarification.
> >>>
> >>>   Guillaume
> >>>
> >>> [1]
> >>> https://wikitech.wikimedia.org/wiki/Incident_
> documentation/20171018-wdqs
> >>> [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429
> >>>
> >>> --
> >>> Guillaume Lederrey
> >>> Operations Engineer, Discovery
> >>> Wikimedia Foundation
> >>> UTC+2 / CEST
> >>>
> >>> ___
> >>> Wikidata mailing list
> >>> Wikidata@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >>
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
>
> --
> Guillaume Lederrey
> Operations Engineer, Discovery
> Wikimedia Foundation
> UTC+2 / CEST
>
> __

Re: [Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Guillaume Lederrey
Hello!

As far as I understand, the dispatch lag is an issue between Wikidata
and the different Wikipedias. There is no involvement of Wikidata
Query Service in this. Sjoerd probably understand that much better
than I do...

Note that this issue also caused some replication lag on one of the
Wikidata Query Service servers [1]. In that case, this was mitigated
by taking that specific server out of rotation and wait for it to
recover before sending traffic to it again. And also note that the
Wikidata Query Service replication lag is a very different kind of lag
than the dispatch lag you were talking about. (yes, all this is
complicated).

Thanks for your interest!

[1] 
https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?refresh=1m&orgId=1&from=now-7d&to=now

On Thu, Oct 19, 2017 at 2:29 PM, Yaroslav Blanter  wrote:
> Thanks Sjoerd. Some en-wiki users consider the delay as a (one more)
> argument that Wikidata is junk and should be thrown down the toilet, so I
> was curious whether the delay was handled as a part of the problem.
>
> Cheers
> Yaroslav
>
> On Thu, Oct 19, 2017 at 12:09 PM, Sjoerd de Bruin 
> wrote:
>>
>> Hi Yaoslav,
>>
>> No, but there has been some dispatch issues in the last few days. The
>> current lag for enwiki is 3 hours, for example. You can see a graph of the
>> dispatch lag here:
>> https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&orgId=1&from=now-7d&to=now
>>
>> Greetings,
>>
>> Sjoerd de Bruin
>> sjoerddebr...@me.com
>>
>> Op 19 okt. 2017, om 11:30 heeft Yaroslav Blanter  het
>> volgende geschreven:
>>
>> Thanks Guilaume,
>>
>> is this the same accident which caused an hour delay of Wikidata items on
>> Wikipedia watchlists?
>>
>> Cheers
>> Yaroslav
>>
>> On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey
>>  wrote:
>>>
>>> Hello all!
>>>
>>> As you might have seen / endured, we've had a Wikdiata Query Service
>>> partial outage yesterday morning (central european time). The full
>>> incident report is available [1] if you are interested in the details.
>>> The short version:
>>>
>>> * a single client started to run an unusually high number of queries on
>>> WDQS
>>> * the overload was not prevented by our current throttling
>>> * the failure was not detected and isolated automatically
>>>
>>> To prevent this from happening again, we will review our throttling
>>> rules. Those rules were previously tuned to prevent a single client
>>> from overloading the service with a small number of expensive
>>> requests: we started to log a client activity only when the duration
>>> of a request exceeded 10 seconds. Which means that a client sending
>>> tons of short requests would never be throttled.
>>>
>>> We will correct that by lowering the threshold to probably 25ms. The
>>> throttling rules are still the same:
>>>
>>> * 60 seconds of processing time per minute (peaking at 120 seconds)
>>> * 30 errors per minute (peaking at 60)
>>>
>>> If you are using WDQS to make lots of small requests, and you are over
>>> the throttling rates above, there is a chance that you will start
>>> seeing throttling errors. We are not doing this to bother you, we're
>>> just trying to keep another crash from happening...
>>>
>>> If you are throttled, you will receive an HTTP 429 error code. This
>>> response include the "Retry-After" HTTP header which specify a number
>>> of seconds you should wait before retrying.
>>>
>>> Thanks for your patience!
>>>
>>> And contact me if you want any clarification.
>>>
>>>   Guillaume
>>>
>>> [1]
>>> https://wikitech.wikimedia.org/wiki/Incident_documentation/20171018-wdqs
>>> [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429
>>>
>>> --
>>> Guillaume Lederrey
>>> Operations Engineer, Discovery
>>> Wikimedia Foundation
>>> UTC+2 / CEST
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Yaroslav Blanter
Thanks Sjoerd. Some en-wiki users consider the delay as a (one more)
argument that Wikidata is junk and should be thrown down the toilet, so I
was curious whether the delay was handled as a part of the problem.

Cheers
Yaroslav

On Thu, Oct 19, 2017 at 12:09 PM, Sjoerd de Bruin 
wrote:

> Hi Yaoslav,
>
> No, but there has been some dispatch issues in the last few days. The
> current lag for enwiki is 3 hours, for example. You can see a graph of the
> dispatch lag here: https://grafana.wikimedia.org/dashboard/db/
> wikidata-dispatch?refresh=1m&orgId=1&from=now-7d&to=now
>
> Greetings,
>
> Sjoerd de Bruin
> sjoerddebr...@me.com
>
> Op 19 okt. 2017, om 11:30 heeft Yaroslav Blanter  het
> volgende geschreven:
>
> Thanks Guilaume,
>
> is this the same accident which caused an hour delay of Wikidata items on
> Wikipedia watchlists?
>
> Cheers
> Yaroslav
>
> On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey <
> gleder...@wikimedia.org> wrote:
>
>> Hello all!
>>
>> As you might have seen / endured, we've had a Wikdiata Query Service
>> partial outage yesterday morning (central european time). The full
>> incident report is available [1] if you are interested in the details.
>> The short version:
>>
>> * a single client started to run an unusually high number of queries on
>> WDQS
>> * the overload was not prevented by our current throttling
>> * the failure was not detected and isolated automatically
>>
>> To prevent this from happening again, we will review our throttling
>> rules. Those rules were previously tuned to prevent a single client
>> from overloading the service with a small number of expensive
>> requests: we started to log a client activity only when the duration
>> of a request exceeded 10 seconds. Which means that a client sending
>> tons of short requests would never be throttled.
>>
>> We will correct that by lowering the threshold to probably 25ms. The
>> throttling rules are still the same:
>>
>> * 60 seconds of processing time per minute (peaking at 120 seconds)
>> * 30 errors per minute (peaking at 60)
>>
>> If you are using WDQS to make lots of small requests, and you are over
>> the throttling rates above, there is a chance that you will start
>> seeing throttling errors. We are not doing this to bother you, we're
>> just trying to keep another crash from happening...
>>
>> If you are throttled, you will receive an HTTP 429 error code. This
>> response include the "Retry-After" HTTP header which specify a number
>> of seconds you should wait before retrying.
>>
>> Thanks for your patience!
>>
>> And contact me if you want any clarification.
>>
>>   Guillaume
>>
>> [1] https://wikitech.wikimedia.org/wiki/Incident_documentation/
>> 20171018-wdqs
>> [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429
>>
>> --
>> Guillaume Lederrey
>> Operations Engineer, Discovery
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-19 Thread Thad Guidry
Hi Luigi,

Have you looked at https://opencorporates.com  ?

Thad
+ThadGuidry 
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Kickstartet: Adding 2.2 million German organisations to Wikidata

2017-10-19 Thread Luigi Assom
Hi,

I would like to join thread I found in the archive:
https://lists.wikimedia.org/pipermail/wikidata//2017-October/011259.html

I worked in contextual research to facilitate knowledge transfer.

One of the domain I would like to treat is visualisation of economics
networks.

I seek for an impact over governance of innovation and transparency over
economics network control, and allow also SMEs companies or private
citizens to build their analytics and prevent cases of collusions.

Information about business profiles is currently a premium service provided
by private specialised corporations, although much of the information about
companies is public, but there is lack of open data policy.

I would like to fill the gap and contribute to feed Wikidata as repository,
either in bulk either as a collective action - as a design thinker I could
contribute to design processes to fill in data, like applications that
facilitate the process.

*Is there any guidance or clearance about this initiatives?*

I am happy to read similar interest from Germany, Belgium and Italy, I
would like to connect.

I read that feeding wikidata with corporate information would significantly
increase the size - though, I think that the benefit to allow to inquire
for public governance would allow to distribute governance of economics
data.

Aside of public services like:
https://www.gov.uk/government/organisations/companies-house

I would like to allow data-visualisation researchers (as myself) to uncover
for the public results like:
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0025995

that relies on private parternships to access corporate databases, and so
findings cannot be quieried by the public.

*Is there a specific Wikidata policy to comply with to feed data from
scrapers of websites?*

As a starter, the URI of sites with good reputation could act as an
*identifier*.
I believe that scraping would be legit for information about property
"facts" (below) are public, and organisations that collated data provides
services (as professional communities or services augmented with private
data) that would be not in competition with building a repository.

In a way, I see wikidata as possibility to indexing data that can be
functional to search engines and discovery engines, and indexing data is an
activity that is daily run by such services. I believe that enabling public
transparency would enhance open-data services.


Below some properties of interest.




Properties I would be interested in are:
- TEAM (founders)
- DESCRIPTION (corporate description over products and services)
- INVESTORS (corp. and private equity)
- EMPLOYEES / INCUBATORS / ADVISORS (personal information available as
public information over the web)
- PARTICIPATED COMPANIES
- DATE of acquisition  or participation to companies
- CAPITAL (if available, or in ranges)
- VAT NUMBER (or registry number)
- ADDRESS

Other ideas to fetch the business profile of companies?
It should be, somehow, publicly available, for each corporate report to the
organisation registry and there are already private companies offering
analytics over the business profiles.



Luigi
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Turning Lists to Wikidata

2017-10-19 Thread Thad Guidry
Hi Stas and Antonin,

Regarding triple storage of list-like data...

In fact, the primary motivation that OpenRefine was developed was as an
importer tool of list-like data to be uploaded into Freebase.
I had tons of difficulty with Freebase's earlier importer tool that did not
allow much flexibility.  And I was adamant and vocal in complaints to
Freebase staff to "give us better importing tools".
OpenRefine was born from those discussions and working with Freebase staff
to develop and design Gridworks, ala Google Refine, ala OpenRefine..

Lists are just rows of lots of individual facts or statements that need
have to be aligned against a schema..
So having a schema alignment dialog, as we had in OpenRefine against
Freebase schema, will be important for absorbing any lists and aligning and
uploading into Wikidata's triple store.
The schema alignment dialog was the core feature that the previous Freebase
importer tool lacked sufficient fluid UI/UX.

It worked fantastic with Freebase and I do not see any reason why it
couldn't be done for Wikidata and simplify the absorption of lists into
Wikidata.

Antonin, was it in your plans to eventually work on the schema alignment
dialog also for uploading data back to Wikidata to complete the circle of
life, "take and give" ?

Thad
+ThadGuidry 

>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Sjoerd de Bruin
Hi Yaoslav,

No, but there has been some dispatch issues in the last few days. The current 
lag for enwiki is 3 hours, for example. You can see a graph of the dispatch lag 
here: 
https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&orgId=1&from=now-7d&to=now
 


Greetings,

Sjoerd de Bruin
sjoerddebr...@me.com

> Op 19 okt. 2017, om 11:30 heeft Yaroslav Blanter  het 
> volgende geschreven:
> 
> Thanks Guilaume,
> 
> is this the same accident which caused an hour delay of Wikidata items on 
> Wikipedia watchlists?
> 
> Cheers
> Yaroslav
> 
> On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey  > wrote:
> Hello all!
> 
> As you might have seen / endured, we've had a Wikdiata Query Service
> partial outage yesterday morning (central european time). The full
> incident report is available [1] if you are interested in the details.
> The short version:
> 
> * a single client started to run an unusually high number of queries on WDQS
> * the overload was not prevented by our current throttling
> * the failure was not detected and isolated automatically
> 
> To prevent this from happening again, we will review our throttling
> rules. Those rules were previously tuned to prevent a single client
> from overloading the service with a small number of expensive
> requests: we started to log a client activity only when the duration
> of a request exceeded 10 seconds. Which means that a client sending
> tons of short requests would never be throttled.
> 
> We will correct that by lowering the threshold to probably 25ms. The
> throttling rules are still the same:
> 
> * 60 seconds of processing time per minute (peaking at 120 seconds)
> * 30 errors per minute (peaking at 60)
> 
> If you are using WDQS to make lots of small requests, and you are over
> the throttling rates above, there is a chance that you will start
> seeing throttling errors. We are not doing this to bother you, we're
> just trying to keep another crash from happening...
> 
> If you are throttled, you will receive an HTTP 429 error code. This
> response include the "Retry-After" HTTP header which specify a number
> of seconds you should wait before retrying.
> 
> Thanks for your patience!
> 
> And contact me if you want any clarification.
> 
>   Guillaume
> 
> [1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20171018-wdqs 
> 
> [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429 
> 
> 
> --
> Guillaume Lederrey
> Operations Engineer, Discovery
> Wikimedia Foundation
> UTC+2 / CEST
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikidata 
> 
> 
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Yaroslav Blanter
Thanks Guilaume,

is this the same accident which caused an hour delay of Wikidata items on
Wikipedia watchlists?

Cheers
Yaroslav

On Thu, Oct 19, 2017 at 10:14 AM, Guillaume Lederrey <
gleder...@wikimedia.org> wrote:

> Hello all!
>
> As you might have seen / endured, we've had a Wikdiata Query Service
> partial outage yesterday morning (central european time). The full
> incident report is available [1] if you are interested in the details.
> The short version:
>
> * a single client started to run an unusually high number of queries on
> WDQS
> * the overload was not prevented by our current throttling
> * the failure was not detected and isolated automatically
>
> To prevent this from happening again, we will review our throttling
> rules. Those rules were previously tuned to prevent a single client
> from overloading the service with a small number of expensive
> requests: we started to log a client activity only when the duration
> of a request exceeded 10 seconds. Which means that a client sending
> tons of short requests would never be throttled.
>
> We will correct that by lowering the threshold to probably 25ms. The
> throttling rules are still the same:
>
> * 60 seconds of processing time per minute (peaking at 120 seconds)
> * 30 errors per minute (peaking at 60)
>
> If you are using WDQS to make lots of small requests, and you are over
> the throttling rates above, there is a chance that you will start
> seeing throttling errors. We are not doing this to bother you, we're
> just trying to keep another crash from happening...
>
> If you are throttled, you will receive an HTTP 429 error code. This
> response include the "Retry-After" HTTP header which specify a number
> of seconds you should wait before retrying.
>
> Thanks for your patience!
>
> And contact me if you want any clarification.
>
>   Guillaume
>
> [1] https://wikitech.wikimedia.org/wiki/Incident_
> documentation/20171018-wdqs
> [2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429
>
> --
> Guillaume Lederrey
> Operations Engineer, Discovery
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Wikidata Query Service partial outage

2017-10-19 Thread Guillaume Lederrey
Hello all!

As you might have seen / endured, we've had a Wikdiata Query Service
partial outage yesterday morning (central european time). The full
incident report is available [1] if you are interested in the details.
The short version:

* a single client started to run an unusually high number of queries on WDQS
* the overload was not prevented by our current throttling
* the failure was not detected and isolated automatically

To prevent this from happening again, we will review our throttling
rules. Those rules were previously tuned to prevent a single client
from overloading the service with a small number of expensive
requests: we started to log a client activity only when the duration
of a request exceeded 10 seconds. Which means that a client sending
tons of short requests would never be throttled.

We will correct that by lowering the threshold to probably 25ms. The
throttling rules are still the same:

* 60 seconds of processing time per minute (peaking at 120 seconds)
* 30 errors per minute (peaking at 60)

If you are using WDQS to make lots of small requests, and you are over
the throttling rates above, there is a chance that you will start
seeing throttling errors. We are not doing this to bother you, we're
just trying to keep another crash from happening...

If you are throttled, you will receive an HTTP 429 error code. This
response include the "Retry-After" HTTP header which specify a number
of seconds you should wait before retrying.

Thanks for your patience!

And contact me if you want any clarification.

  Guillaume

[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20171018-wdqs
[2] https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#429

-- 
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata