Re: [Wikitech-l] How to mount a local copy of the English Wikipedia for researchers?

2012-06-13 Thread Steve Bennett
Thanks, I'm trying this. It consumes phenomenal amounts of memory though - I keep getting a Killed message from Ubuntu, even with a 20Gb swap file. Will keep trying with an even bigger one. I'll also give mwdumper another go. Steve On Wed, Jun 13, 2012 at 3:03 PM, Adam Wight s...@ludd.net

[Wikitech-l] How to mount a local copy of the English Wikipedia for researchers?

2012-06-12 Thread Steve Bennett
Hi all, I've been tasked with setting up a local copy of the English Wikipedia for researchers - sort of like another Toolserver. I'm not having much luck, and wondered if anyone has done this recently, and what approach they used? We only really need the current article text - history and meta

Re: [Wikitech-l] How to mount a local copy of the English Wikipedia for researchers?

2012-06-12 Thread Lars Aronsson
On 2012-06-12 23:19, Steve Bennett wrote: I've been tasked with setting up a local copy of the English Wikipedia for researchers - sort of like another Toolserver. I'm not having much luck, Have your researchers learn Icelandic. Importing the small Icelandic Wikipedia is fast. They can test

Re: [Wikitech-l] How to mount a local copy of the English Wikipedia for researchers?

2012-06-12 Thread Jona Christopher Sahnwaldt
mwdumper seems to work for recent dumps: http://lists.wikimedia.org/pipermail/mediawiki-l/2012-May/039347.html On Tue, Jun 12, 2012 at 11:19 PM, Steve Bennett stevag...@gmail.com wrote: Hi all,  I've been tasked with setting up a local copy of the English Wikipedia for researchers - sort of

Re: [Wikitech-l] How to mount a local copy of the English Wikipedia for researchers?

2012-06-12 Thread Adam Wight
I ran into this problem recently. A python script is available at https://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/Offline/mwimport.py, that will convert .xml.bz2 dumps into flat fast-import files which can be loaded into most databases. Sorry this tool is still alpha quality.