James Linden wrote: >> Why do you need to access the live wikipedia for this? >> Using categorylinks.sql and page.sql you should be able to fetch the >> same data. Probably faster. > > In my research, the answer to this question is two-fold > > A) Creating a local copy of wikipedia (using mediawiki and various > import tools) is quite a process, and requires a significant > investment of time and research unto itself.
You don't need to do a full copy to eg. fetch infoboxes. > B) A few months ago, I pulled 333 semi-random articles from the live > API -- of those, 329 of them have significant enough changes since > 20100312 dump (which was the newest dump at the time). A new check > against the 20110115 dump has similar percentage. Getting updated data may be a reason, but I don't think that's what Ramesh wanted. Plus, you wanted 333 articles, not the 3 million... _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
