For the pywikipedia-l listeners just tuning in: the toolserver has an
overload of interwiki bots, and we want to reduce this. As such, we want to
switch to a single bot that runs all the interwiki updates from the
toolserver.

On 16 January 2012 09:19, Merlijn van Deen <valhall...@arctus.nl> wrote:

> The only reasonable action we can take to reduce the memory
> consumption is to let the OS do its job in freeing memory: using one
> process to track pages that have to be corrected (using the database,
> if possible), and one process to do the actual fixing (interwiki.py).
> This should be reasonably easy to implement (i.e. use a pywikibot page
> generator to generate a list of pages, use a database layer to track
> interlanguage links and popen('interwiki.py <page>') if this is a
> fixable situation)
>
>
I took some time yesterday to work out some details on this - see
http://piratepad.net/T29Uj4j1U4 . It boils down to this:

1) generation of a list of pages to work on: from the database, if possible
2) dispatching interwiki.py based on that list of pages and handling logging
3) interwiki.py itself

My suggestion is to split these tasks, and creating a simple interface
(e.g. WSGI) between 1) and 2), and using subprocesses for 2) to 3).

Yesterday, I have been working (mainly) on speeding up the startup of
interwiki.py, so that we can spawn one process per Page.

On the Toolserver side, I would appreciate any comments/work/existing work
on the creation of an interwiki graph from the database - there are already
tools that suggest images based on interwiki links, so this code should be
around - and hopefully be adaptable. The only goal for this process would
be to create a list of starting pages interwiki.py can use - i.e. graphs
with one or more missing links, but without any double links.

On the Pywikipedia side, some thoughts on running interwiki.py in a new
process would be welcome. e.g. how can we improve startup time ('kill all
the regexps!') and effectively spawn multiple processes to run. What
parameters (throttles?) should be tuned, et cetera.

Best,
Merlijn
_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to