Hi,

Thanks for the quick answers, and for the useful link.

My previous e-mail was not detailed enough; sorry about that. Let me
clarify:
- I don't need to crawl the entire Wikipedia, only (for example) articles in
a category. ~1,000 articles would be a good start, and I definitely won't be
going above ~40,000 articles.
- For every article in the data set, I need to follow every interlanguage
link, and get the article creation date (i.e. creation date of [[en:Brad
Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this
means that I need one query for every language link.

The data are reasonably easy to get through the API. If my queries risk
overloading the server, I am obviously happy to go through the toolserver
(once my account gets approved!).


Robin Ryder
----
Postdoctoral researcher
CEREMADE - Paris Dauphine and CREST - INSEE

> On 24.09.2010, 14:32 Robin wrote:
>
>> I would like to collect data on interlanguage links for academic research
>> purposes. I really do not want to use the dumps, since I would need to
>> download dumps of all language Wikipedias, which would be huge.
>> I have written a script which goes through the API, but I am wondering
how
>> often it is acceptable for me to query the API. Assuming I do not run
>> parallel queries, do I need to wait between each query? If so, how long?
>
> Crawling all the Wikipedias is not an easy task either. Probably,
> toolserver.org would be more suitable. What data do you need, exactly?
>
> --
> Best regards,
>   Max Semenik ([[User:MaxSem]])
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to