I'm not sure I understand you. Searching for "Amadeus Mozart" in the replicated databases could help, yes, but the number of articles that share a common String through different languages is quite small, isn't ? It works for some specific concepts and personalities, but most of the article titles need to be translated, and a search using wildcards or regexps is not going to help for these.
Honestly, the pywikipedia team has a bit changed these last months, and the API edit will soon be available : I've been telling myself for days that interwiki.py will need sooner or later a rewrite. But this is not this easy. I understand your concept of "interwiki class", but finding such a class does not appear to be this obvious. If you have a general pseudo-algorithm being able to outline a specific class of articles on the same subject, please share it. But I think that the actual behavior -- starting from a specific page, building the interwikik links graph, and indexing the cycles -- if not optimal, can not be avoided this easily. 2008/6/11 Purodha <[EMAIL PROTECTED]>: > I could not store this comment on the blog server. > Feel free to put it there if you can, or forward it elsewhere, > if you see fit. > > Since an interwiki link needing propagation may exist only once in > one specific wiki in one specific page, all pages having the > potential for interwiki linking in each language of a project need > to be read. There is no reason, not to have a single bot doing > this, but as pywikipediabot is currently structured, it is always > operated starting from a selection of pages of one idividual wiki > only. These selctions may be huge, such as all articles in the > English wikipedia (but no non-article pages, such as templates, or > category pages, and no other language) So with the current > structure, it is advisable, for each language wiki, to have at > least one bot starting from it regularly, propagating the "here > only" set links to the remaining wikis. > > There is another sad thing to mention. If only one link could not > be set - be it because of an edit conflict, a transient network > error, server overload, or because a bot is not allowed to access > a specific wiki - the entire bot run for all linked articles in > this interwiki class has to be repeated just to add this single > missing link. The majority of interwiki bots is serving only a > comparatively small number of wikis. Its hard to get a single bot > to serve all language wikis. It requires a lot of labour due to > the sheer number of wikis there is, each and every wiki requires > an individual account to be set up and an inividual bot > application by rules individual to each wiki, which you have to > find, read, understand, and obbey, proceedings and procedures > vary, and are in part contradictive between wikis. Even if you > follow their rules, some wiki communities, or their bureaurocrats, > just don't do it, for one or another reason or without. > > An "interwiki class" is the set of pages each (needing to be) > linked to each other in the same class. Such classes can be as > little as two pages, and as big as one page from each wiki in a > family. > > A slightly redesigned interwiki bot reading replicated databases > and tables on the toolserver could be collecting class information > much more efficiently than interwiki.py currently does by > exporting groups of articles from each wiki. Provided, there is no > significant replication lag, it would be even more up to date when > it comes to updating pages, because of its excessively higher > speed of collecting the members of a class. Such a redesign would > also allow to more easily implement various helpful new ways of > selecting which pages to look at, e.g. "language='all', > title='Amadeus Mozart'", or ones using SQL wildcards or regular > expressions, etc. > > Greetings. > Purodha. > > > > _______________________________________________ > Toolserver-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/toolserver-l > -- Nicolas Dumazet — NicDumZ [ nIk.d̪ymz ] pywikipedia & mediawiki Deuxième année ENSIMAG. 06 03 88 92 29 _______________________________________________ Toolserver-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/toolserver-l
