James Linden wrote:
>> Why do you need to access the live wikipedia for this?
>> Using categorylinks.sql and page.sql you should be able to fetch the
>> same data. Probably faster.
> 
> In my research, the answer to this question is two-fold
> 
> A) Creating a local copy of wikipedia (using mediawiki and various
> import tools) is quite a process, and requires a significant
> investment of time and research unto itself.

You don't need to do a full copy to eg. fetch infoboxes.


> B) A few months ago, I pulled 333 semi-random articles from the live
> API -- of those, 329 of them have significant enough changes since
> 20100312 dump (which was the newest dump at the time). A new check
> against the 20110115 dump has similar percentage.

Getting updated data may be a reason, but I don't think that's what
Ramesh wanted.
Plus, you wanted 333 articles, not the 3 million...


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to