Hi, Thanks for the quick answers, and for the useful link.
My previous e-mail was not detailed enough; sorry about that. Let me clarify: - I don't need to crawl the entire Wikipedia, only (for example) articles in a category. ~1,000 articles would be a good start, and I definitely won't be going above ~40,000 articles. - For every article in the data set, I need to follow every interlanguage link, and get the article creation date (i.e. creation date of [[en:Brad Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this means that I need one query for every language link. The data are reasonably easy to get through the API. If my queries risk overloading the server, I am obviously happy to go through the toolserver (once my account gets approved!). Robin Ryder ---- Postdoctoral researcher CEREMADE - Paris Dauphine and CREST - INSEE > On 24.09.2010, 14:32 Robin wrote: > >> I would like to collect data on interlanguage links for academic research >> purposes. I really do not want to use the dumps, since I would need to >> download dumps of all language Wikipedias, which would be huge. >> I have written a script which goes through the API, but I am wondering how >> often it is acceptable for me to query the API. Assuming I do not run >> parallel queries, do I need to wait between each query? If so, how long? > > Crawling all the Wikipedias is not an easy task either. Probably, > toolserver.org would be more suitable. What data do you need, exactly? > > -- > Best regards, > Max Semenik ([[User:MaxSem]]) > > > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
