2010/9/24 Robin Ryder <[email protected]>: > - I don't need to crawl the entire Wikipedia, only (for example) articles in > a category. ~1,000 articles would be a good start, and I definitely won't be > going above ~40,000 articles. > - For every article in the data set, I need to follow every interlanguage > link, and get the article creation date (i.e. creation date of [[en:Brad > Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this > means that I need one query for every language link. > Unfortunately, this is true. You can't use a generator because those don't work with interwiki titles, and you can't query multiple titles in one request because prop=revisions only allows that in get-only-the-latest-revision mode (and you want the earliest revision).
Hitting the API repeatedly without waiting between requests and without making parallel requests is considered acceptable usage AFAIK, but I do think that the Toolserver would better suit your needs. Roan Kattouw (Catrope) _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
