Thanks, I'm trying this. It consumes phenomenal amounts of memory
though - I keep getting a Killed message from Ubuntu, even with a
20Gb swap file. Will keep trying with an even bigger one.
I'll also give mwdumper another go.
Steve
On Wed, Jun 13, 2012 at 3:03 PM, Adam Wight s...@ludd.net
Hi all,
I've been tasked with setting up a local copy of the English
Wikipedia for researchers - sort of like another Toolserver. I'm not
having much luck, and wondered if anyone has done this recently, and
what approach they used? We only really need the current article text
- history and meta
On 2012-06-12 23:19, Steve Bennett wrote:
I've been tasked with setting up a local copy of the English
Wikipedia for researchers - sort of like another Toolserver. I'm not
having much luck,
Have your researchers learn Icelandic. Importing the
small Icelandic Wikipedia is fast. They can test
mwdumper seems to work for recent dumps:
http://lists.wikimedia.org/pipermail/mediawiki-l/2012-May/039347.html
On Tue, Jun 12, 2012 at 11:19 PM, Steve Bennett stevag...@gmail.com wrote:
Hi all,
I've been tasked with setting up a local copy of the English
Wikipedia for researchers - sort of
I ran into this problem recently. A python script is available at
https://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/Offline/mwimport.py,
that will convert .xml.bz2 dumps into flat fast-import files which can be
loaded into most databases. Sorry this tool is still alpha quality.