Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Markus Krötzsch

On 19.11.2015 10:40, Gerard Meijssen wrote:

Hoi,
Because once it is a requirement and not a recommendation, it will be
impossible to reverse this. The insidious creep of more rules and
requirements will make Wikidata increasingly less of a wiki. Arguably
most of the edits done by bot are of a higher quality than those done by
hand. It is for the people maintaining the SPARQL environment to ensure
that it is up to the job as it does not affect Wikidata itself.

I think each of these argument holds its own. Together they are
hopefully potent enough to prevent such silliness.


Maybe it would not be that bad. I actually think that many bots right 
now are slower than they could be because they are afraid to overload 
the site. If bots would check the lag, they could operate close to the 
maximum load that the site can currently handle, which is probably more 
than most bots are doing now.


The "requirement" vs. "recommendation" thing is maybe not so relevant, 
since bot rules (mandatory or not) are currently not enforced in any 
strong way. Basically, the whole system is based on mutual trust and 
this is how it should stay.


Markus

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Gerard Meijssen
Hoi,
Because once it is a requirement and not a recommendation, it will be
impossible to reverse this. The insidious creep of more rules and
requirements will make Wikidata increasingly less of a wiki. Arguably most
of the edits done by bot are of a higher quality than those done by hand.
It is for the people maintaining the SPARQL environment to ensure that it
is up to the job as it does not affect Wikidata itself.

I think each of these argument holds its own. Together they are hopefully
potent enough to prevent such silliness.

Thanks,
 GerardM


On 19 November 2015 at 08:55, Tom Morris  wrote:

> So, the page that Markus points to describes heeding the replication lag
> limit as a recommendation.  Since running a bot is a privilege, not a
> right, why isn't the "recommendation" a requirement instead of a
> recommendation?
>
> Tom
>
> On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> On 18.11.2015 19:40, Federico Leva (Nemo) wrote:
>>
>>> Andra Waagmeester, 18/11/2015 19:03:
>>>
 How do you do add "hunderds (if not thousands)" items per minute?

>>>
>>> Usually
>>> 1) concurrency,
>>> 2) low latency.
>>>
>>
>> In fact, it is not hard to get this. I guess Andra is getting speeds of
>> 20-30 items because their bot framework is throttling the speed on purpose.
>> If I don't throttle WDTK, I can easily do well over 100 edits per minute in
>> a single thread (I did not try the maximum ;-).
>>
>> Already a few minutes of fast editing might push up the median dispatch
>> lag sufficiently for a bot to stop/wait. While the slow edit rate is a
>> rough guess (not a strict rule), respecting the dispatch stats is mandatory
>> for Wikidata bots, so things will eventually slow down (or your bot be
>> blocked ;-). See [1].
>>
>> Markus
>>
>> [1] https://www.wikidata.org/wiki/Wikidata:Bots
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-19 Thread Lukas Benedix
Is there any evidence, that the quality of bot edits is higher than edits
by humans?

LB

> Hoi,
> Because once it is a requirement and not a recommendation, it will be
> impossible to reverse this. The insidious creep of more rules and
> requirements will make Wikidata increasingly less of a wiki. Arguably most
> of the edits done by bot are of a higher quality than those done by hand.
> It is for the people maintaining the SPARQL environment to ensure that it
> is up to the job as it does not affect Wikidata itself.
>
> I think each of these argument holds its own. Together they are hopefully
> potent enough to prevent such silliness.
>
> Thanks,
>  GerardM
>
>
> On 19 November 2015 at 08:55, Tom Morris  wrote:
>
>> So, the page that Markus points to describes heeding the replication lag
>> limit as a recommendation.  Since running a bot is a privilege, not a
>> right, why isn't the "recommendation" a requirement instead of a
>> recommendation?
>>
>> Tom
>>
>> On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch <
>> mar...@semantic-mediawiki.org> wrote:
>>
>>> On 18.11.2015 19:40, Federico Leva (Nemo) wrote:
>>>
 Andra Waagmeester, 18/11/2015 19:03:

> How do you do add "hunderds (if not thousands)" items per minute?
>

 Usually
 1) concurrency,
 2) low latency.

>>>
>>> In fact, it is not hard to get this. I guess Andra is getting speeds of
>>> 20-30 items because their bot framework is throttling the speed on
>>> purpose.
>>> If I don't throttle WDTK, I can easily do well over 100 edits per
>>> minute in
>>> a single thread (I did not try the maximum ;-).
>>>
>>> Already a few minutes of fast editing might push up the median dispatch
>>> lag sufficiently for a bot to stop/wait. While the slow edit rate is a
>>> rough guess (not a strict rule), respecting the dispatch stats is
>>> mandatory
>>> for Wikidata bots, so things will eventually slow down (or your bot be
>>> blocked ;-). See [1].
>>>
>>> Markus
>>>
>>> [1] https://www.wikidata.org/wiki/Wikidata:Bots
>>>
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Lydia Pintscher
On Tue, Nov 17, 2015 at 7:39 PM, James Heald  wrote:
> Any idea what's going on with the SPARQL service ?
>
> Usually the data gets updated every minute or two, but it's over 11 hours
> now.

My best guess looking at things right now is that SuccuBot is making a
huge number of edits and the updater for the query service might not
be able to handle that yet. Stas: Could you have a look?


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Stas Malyshev
Hi!

>> Usually the data gets updated every minute or two, but it's over 11 hours
>> now.
> 
> My best guess looking at things right now is that SuccuBot is making a
> huge number of edits and the updater for the query service might not
> be able to handle that yet. Stas: Could you have a look?

Yes, looks like there's a large volume of updates, so the service is
several hours behind, but it seems to be catching up now. What's the bot
is doing?

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Lydia Pintscher
On Wed, Nov 18, 2015 at 4:44 PM, Stas Malyshev  wrote:
> Yes, looks like there's a large volume of updates, so the service is
> several hours behind, but it seems to be catching up now. What's the bot
> is doing?

https://www.wikidata.org/wiki/Special:Contributions/SuccuBot


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Stas Malyshev
Hi!

>> Yes, looks like there's a large volume of updates, so the service is
>> several hours behind, but it seems to be catching up now. What's the bot
>> is doing?
> 
> https://www.wikidata.org/wiki/Special:Contributions/SuccuBot

Last set of edits seems suspect to me - e.g. adding copies of en label
to a bunch of species as ru label without it even having ruwiki entry.
I'm not sure it's a good thing, but yes, that would generate quite a big
load on updating especially that it seems to be adding hundreds (if not
thousands) labels per minute. I've also added a note on Succu's talk
page to discuss it.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Lydia Pintscher
On Wed, Nov 18, 2015 at 7:03 PM, Andra Waagmeester  wrote:
> How do you do add "hunderds (if not thousands)" items per minute? We
> typically see speeds of 20-30 items per minute with our bot account. For our
> purposes it would be convenient if that number can be increased.

That rate is preferred for technical (dispatching to Wikipedia and co
and updating of the query service) and social (ability to review
changes) reason.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Andra Waagmeester
How do you do add "hunderds (if not thousands)" items per minute? We
typically see speeds of 20-30 items per minute with our bot account. For
our purposes it would be convenient if that number can be increased.

On Wed, Nov 18, 2015 at 4:58 PM, Stas Malyshev 
wrote:

> Hi!
>
> >> Yes, looks like there's a large volume of updates, so the service is
> >> several hours behind, but it seems to be catching up now. What's the bot
> >> is doing?
> >
> > https://www.wikidata.org/wiki/Special:Contributions/SuccuBot
>
> Last set of edits seems suspect to me - e.g. adding copies of en label
> to a bunch of species as ru label without it even having ruwiki entry.
> I'm not sure it's a good thing, but yes, that would generate quite a big
> load on updating especially that it seems to be adding hundreds (if not
> thousands) labels per minute. I've also added a note on Succu's talk
> page to discuss it.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Stas Malyshev
Hi!

The service have caught up now, but in the near future I would like to
ask to make the bots throttle the edits a bit, for now. In the meantime,
I'll look into speeding up the update process further, but given the
nature of the database it may still be possible to temporarily overload
it with large enough update stream. So keeping bot updates under 10 per
second would be nice (this is somewhat arbitrary from back-of-the
envelope calculation, so don't take the exact figure *too* seriously).
Note that this should not be too hard a limit - that allows to update
every single record now in Wikidata in about 2 weeks, which seems to be
OK for most tasks. But it is a limitation and as I said, I'll work to
eventually get rid of it.

Thanks,
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Tom Morris
So, the page that Markus points to describes heeding the replication lag
limit as a recommendation.  Since running a bot is a privilege, not a
right, why isn't the "recommendation" a requirement instead of a
recommendation?

Tom

On Wed, Nov 18, 2015 at 3:30 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 18.11.2015 19:40, Federico Leva (Nemo) wrote:
>
>> Andra Waagmeester, 18/11/2015 19:03:
>>
>>> How do you do add "hunderds (if not thousands)" items per minute?
>>>
>>
>> Usually
>> 1) concurrency,
>> 2) low latency.
>>
>
> In fact, it is not hard to get this. I guess Andra is getting speeds of
> 20-30 items because their bot framework is throttling the speed on purpose.
> If I don't throttle WDTK, I can easily do well over 100 edits per minute in
> a single thread (I did not try the maximum ;-).
>
> Already a few minutes of fast editing might push up the median dispatch
> lag sufficiently for a bot to stop/wait. While the slow edit rate is a
> rough guess (not a strict rule), respecting the dispatch stats is mandatory
> for Wikidata bots, so things will eventually slow down (or your bot be
> blocked ;-). See [1].
>
> Markus
>
> [1] https://www.wikidata.org/wiki/Wikidata:Bots
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS updates have stopped

2015-11-18 Thread Gerard Meijssen
Hoi,
So in essence, WDQ and WDQS have the same problem of keeping up with the
database. How do the two compare?
Thanks,
 GerardM

On 19 November 2015 at 01:48, Stas Malyshev  wrote:

> Hi!
>
> The service have caught up now, but in the near future I would like to
> ask to make the bots throttle the edits a bit, for now. In the meantime,
> I'll look into speeding up the update process further, but given the
> nature of the database it may still be possible to temporarily overload
> it with large enough update stream. So keeping bot updates under 10 per
> second would be nice (this is somewhat arbitrary from back-of-the
> envelope calculation, so don't take the exact figure *too* seriously).
> Note that this should not be too hard a limit - that allows to update
> every single record now in Wikidata in about 2 weeks, which seems to be
> OK for most tasks. But it is a limitation and as I said, I'll work to
> eventually get rid of it.
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata