On Fri, Sep 24, 2010 at 1:19 PM, Max Semenik <[email protected]> wrote:
> On 24.09.2010, 14:32 Robin wrote: > > > I would like to collect data on interlanguage links for academic research > > purposes. I really do not want to use the dumps, since I would need to > > download dumps of all language Wikipedias, which would be huge. > > I have written a script which goes through the API, but I am wondering > how > > often it is acceptable for me to query the API. Assuming I do not run > > parallel queries, do I need to wait between each query? If so, how long? > > Crawling all the Wikipedias is not an easy task either. Probably, > toolserver.org would be more suitable. What data do you need, exactly? > Full dumps are not required for retrieving interlanguage links. For example, the last fr dump contains a dedicated file for them : http://download.wikimedia.org/frwiki/20100915/frwiki-20100915-langlinks.sql.gz It will be a lot faster to download this file (only 75M) than making more than 1 million calls to the API for the fr wiki. Nico _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
