On Fri, Sep 24, 2010 at 1:19 PM, Max Semenik <[email protected]> wrote:

> On 24.09.2010, 14:32 Robin wrote:
>
> > I would like to collect data on interlanguage links for academic research
> > purposes. I really do not want to use the dumps, since I would need to
> > download dumps of all language Wikipedias, which would be huge.
> > I have written a script which goes through the API, but I am wondering
> how
> > often it is acceptable for me to query the API. Assuming I do not run
> > parallel queries, do I need to wait between each query? If so, how long?
>
> Crawling all the Wikipedias is not an easy task either. Probably,
> toolserver.org would be more suitable. What data do you need, exactly?
>

Full dumps are not required for retrieving interlanguage links.
For example, the last fr dump contains a dedicated file for them :
http://download.wikimedia.org/frwiki/20100915/frwiki-20100915-langlinks.sql.gz

It will be a lot faster to download this file (only 75M) than making more
than 1 million calls to the API for the fr wiki.

Nico
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to