I'm not sure I understand you.

Searching for "Amadeus Mozart" in the replicated databases could help,
yes, but the number of articles that share a common String through
different languages is quite small, isn't ?
It works for some specific concepts and personalities, but most of the
article titles need to be translated, and a search using wildcards or
regexps is not going to help for these.

Honestly, the pywikipedia team has a bit changed these last months,
and the API edit will soon be available : I've been telling myself for
days that interwiki.py will need sooner or later a rewrite. But this
is not this easy.

I understand your concept of "interwiki class", but finding such a
class does not appear to be this obvious.

If you have a general pseudo-algorithm being able to outline a
specific class of articles on the same subject, please share it. But I
think that the actual behavior -- starting from a specific page,
building the interwikik links graph, and indexing the cycles -- if not
optimal, can not be avoided this easily.


2008/6/11 Purodha <[EMAIL PROTECTED]>:
> I could not store this comment on the blog server.
> Feel free to put it there if you can, or forward it elsewhere,
> if you see fit.
>
> Since an interwiki link needing propagation may exist only once in
> one specific wiki in one specific page, all pages having the
> potential for interwiki linking in each language of a project need
> to be read. There is no reason, not to have a single bot doing
> this, but as pywikipediabot is currently structured, it is always
> operated starting from a selection of pages of one idividual wiki
> only. These selctions may be huge, such as all articles in the
> English wikipedia (but no non-article pages, such as templates, or
> category pages, and no other language) So with the current
> structure, it is advisable, for each language wiki, to have at
> least one bot starting from it regularly, propagating the "here
> only" set links to the remaining wikis.
>
> There is another sad thing to mention. If only one link could not
> be set - be it because of an edit conflict, a transient network
> error, server overload, or because a bot is not allowed to access
> a specific wiki - the entire bot run for all linked articles in
> this interwiki class has to be repeated just to add this single
> missing link. The majority of interwiki bots is serving only a
> comparatively small number of wikis. Its hard to get a single bot
> to serve all language wikis. It requires a lot of labour due to
> the sheer number of wikis there is, each and every wiki requires
> an individual account to be set up and an inividual bot
> application by rules individual to each wiki, which you have to
> find, read, understand, and obbey, proceedings and procedures
> vary, and are in part contradictive between wikis. Even if you
> follow their rules, some wiki communities, or their bureaurocrats,
> just don't do it, for one or another reason or without.
>
> An "interwiki class" is the set of pages each (needing to be)
> linked to each other in the same class. Such classes can be as
> little as two pages, and as big as one page from each wiki in a
> family.
>
> A slightly redesigned interwiki bot reading replicated databases
> and tables on the toolserver could be collecting class information
> much more efficiently than interwiki.py currently does by
> exporting groups of articles from each wiki. Provided, there is no
> significant replication lag, it would be even more up to date when
> it comes to updating pages, because of its excessively higher
> speed of collecting the members of a class. Such a redesign would
> also allow to more easily implement various helpful new ways of
> selecting which pages to look at, e.g. "language='all',
> title='Amadeus Mozart'", or ones using SQL wildcards or regular
> expressions, etc.
>
> Greetings.
> Purodha.
>
>
>
> _______________________________________________
> Toolserver-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
>



-- 
Nicolas Dumazet — NicDumZ [ nIk.d̪ymz ]
pywikipedia & mediawiki
Deuxième année ENSIMAG.
06 03 88 92 29
_______________________________________________
Toolserver-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Reply via email to