Hi all!

As you may have noticed, the propagation of changes from Wikidata to the
Wikipedias has been slower than it should be. Because of this, changes on
Wikidata have been showing up on Wikipedia watchlists very late, or not at all.

Katie and I have investigated the causes for this and what we can do about it.
To keep you in the loop, here is what we found:

* A dispatcher needs about 3 seconds to dispatch 1000 changes to a client wiki.
* Considering we have ~300 client wikis, this means one dispatcher can handle
about 4000 changes per hour.
* We currently have two dispatchers running in parallel (on a single box, hume),
that makes a capacity of 8000 changes/hour.
* We are seeing roughly 17000 changes per hour on wikidata.org - more than twice
our dispatch capacity.
* I want to try running 6 dispatcher processes; that would give us the capacity
to handle 24000 changes per hour (assuming linear scaling).

Katie has prepared a patch for that: https://gerrit.wikimedia.org/r/#/c/55904/

Getting this patch in is currently the quickest way for us to make change
propagation work. I hope running all the processes on the same box is not a
problem, a second box for cron job will be set up "soon".

Future:

* Making the dispatcher a "real demon" would probably help with getting it
deployed to more boxes.

* If the Job Queue gets support for delayed (and maybe also recurring) jobs, we
could use the existing JQ infrastructure, and wouldn't need any processes for
ourselves. I'm a bit unsure though how well we could control scaling in such a
setup.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to