Re: [Wikitech-l] Wikidata change propogation

2013-01-04 Thread Daniel Kinzler
Thanks Rob for starting the conversation about this.

I have explained our questions about how to run updates in the mail titled
Running periodic updates on a large number of wikis, because I feel that this
is a more general issue, and I'd like to decouple it a bit from the Wikidata
specifics.

I'll try to reply and clarify some other points below.

On 03.01.2013 23:57, Rob Lanphier wrote:
 The thing that isn't covered here is how it works today, which I'll
 try to quickly sum up.  Basically, it's a single cron job, running on
 hume[1].  
[..]
 When a change is made on wikidata.org with the intent of updating an
 arbitrary wiki (say, Hungarian Wikipedia), one has to wait for this
 single job to get around to running the update on whatever wikis are
 in line prior to Hungarian WP before it gets around to updating that
 wiki, which could be hundreds of wikis.  That isn't *such* a big deal,
 because the alternative is to purge the page, which will also work.

Worse: currently, we would need one cron job for each wiki to update. I have
explained this some more in the Running periodic updates mail.

 Another problem is that this is running on a specific, named machine.
 This will likely get to be a big enough job that one machine won't be
 enough, and we'll need to scale this up.

My concern is not so much scalability (the updater will just be a dispatcher,
shoveling notifications from one wiki's database to another) but the lack of
redundancy. We can't simply configure the same cron job on another machine in
case the first one crashes. That would lead to conflicts and duplicates. See the
Running periodic updates mail for more.

 The problem is that we don't have a good plan for a permanent solution
 nailed down.  It feels like we should make this work with the job
 queue, but the worry is that once Wikidata clients are on every single
 wiki, we're going to basically generate hundreds of jobs (one per
 wiki) for every change made on the central wikidata.org wiki.

The idea is for the dispatcher jobs to look at all the updates on wikidata that
have note yet been handed to the target wiki, batch them up, wrap them in a Job,
and post them to the target wiki's job queue. When the job is executed on the
target wiki, the notifications can be further filtered, combined and batched
using local knowledge. Based on this, the required purging is performed on the
client wiki, rerende/link update jobs scheduled, etc.

However, the question of where, when and how to run the dispatcher process
itself is still open, which is what I hope to change with the Running periodic
updates mail.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Wikidata change propogation

2013-01-03 Thread Rob Lanphier
Hi folks,

One item that comes up pretty frequently in our regular conversations
with the Wikidata folks is the question of how change propagation
should work.  This email is largely directed at the relevant folks in
WMF's Ops and Platform Eng groups (and obviously, also the Wikidata
team), but I'm erring on the side of distributing too widely rather
than too narrowly.  I originally asked Daniel to send this (earlier
today my time, which was late in his day), but decided that even
though I'm not going to be as good at describing the technical details
(and I'm hoping he chimes in), I know a lot better what I was asking
for, so I should just write it.

The spec is here:
https://meta.wikimedia.org/wiki/Wikidata/Notes/Change_propagation#Dispatching_Changes

The thing that isn't covered here is how it works today, which I'll
try to quickly sum up.  Basically, it's a single cron job, running on
hume[1].  So, that means that when a change is made on wikidata.org,
one has to wait for this job to get around to running before the item.
 It'd be good for someone from the Wikidata team to

We've declared that Good Enough(tm) for now, where now is the period
of time where we'll be running the Wikidata client on a small number
of wikis (currently test2, soon Hungarian Wikipedia).

The problem is that we don't have a good plan for a permanent solution
nailed down.  It feels like we should make this work with the job
queue, but the worry is that once Wikidata clients are on every single
wiki, we're going to basically generate hundreds of jobs (one per
wiki) for every change made on the central wikidata.org wiki.

Guidance on what a permanent solution should look like?  If you'd like
to wait for Daniel to clarify some of the tech details before
answering, that's fine.

Rob

[1]  http://wikitech.wikimedia.org/view/Hume

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Wikidata change propogation

2013-01-03 Thread Rob Lanphier
On Thu, Jan 3, 2013 at 2:57 PM, Rob Lanphier ro...@wikimedia.org wrote:
 The thing that isn't covered here is how it works today, which I'll
 try to quickly sum up.  Basically, it's a single cron job, running on
 hume[1].  So, that means that when a change is made on wikidata.org,
 one has to wait for this job to get around to running before the item.
  It'd be good for someone from the Wikidata team to

*sigh* the dangers of sending email in haste (and being someone who
frequently composes email non-linearly).  What I meant to say was
this:

When a change is made on wikidata.org with the intent of updating an
arbitrary wiki (say, Hungarian Wikipedia), one has to wait for this
single job to get around to running the update on whatever wikis are
in line prior to Hungarian WP before it gets around to updating that
wiki, which could be hundreds of wikis.  That isn't *such* a big deal,
because the alternative is to purge the page, which will also work.

Another problem is that this is running on a specific, named machine.
This will likely get to be a big enough job that one machine won't be
enough, and we'll need to scale this up.

It would be good for Daniel or someone else from the Wikidata team to
chime in to verify I'm characterizing the problem correctly.

Rob

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l