-------- Message original --------
Sujet: The Whole Wikipedia in English with pictures in one 40GB big file
Date : Sat, 01 Mar 2014 18:01:10 +0100
De : Emmanuel Engelhart <>
Pour :
<>, Using Wikimedia projects and
MediaWiki offline <>
Copie à : Wikimedia developers <>


For the first time, we have achieved to release a complete dump of all
encyclopedic articles of the Wikipedia in English, *with thumbnails*.

This ZIM file is 40 GB big and contains the current 4.5 million articles
with their 3.5 millions pictures:

This ZIM file is directly and easily usable on many types of devices
like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian
with Wikionboard.

You don't need modern computers with big CPUs. You can for example
create a (read-only) Wikipedia mirror on a RaspberryPi for ~100USD by
using our ZIM dedicated Web server called kiwix-serve. A demo is
available here:

Like always, we also provide a packaged version (for the main PC
systems) which includes fulltext search index+ZIM file+binaries:

What is interesting too: This file was generated in less than 2 weeks
thanks to multiples recent innovations:
* The Parsoid (cluster), which gives us an HTML output with additional
semantic RDF tags
* mwoffliner, a nodejs script able to dumps pages based on the Mediawiki
API (and Parsoid API)
* zimwriterfs, a solution able to compile any local HTML directory to a
ZIM file

We have now an efficient way to generate new ZIM files. Consequently, we
will work to industrialize and automatize the ZIM file generation
process, one thing which is probably the most oldest and important
problem we still face at Kiwix.

All this would not have been possible without the support:
* Wikimedia CH and the "ZIM autobuild" project
* Wikimedia France and the Afripedia project
* Gwicke from the WMF Parsoid dev team.

BTW, we need additional developer helps with javascript/nodejs skills to
fix a few issues on mwoffliner:
* Recreate the "table of content" based on the HTML DOM (*)
* Scrape Mediawiki Resourceloader in a manner it will continue to work
offline (***)
* Scrape categories (**)
* Localized the script (*)
* Improve the global performance by introducing usage of workers (**)
* Create nodezim, the libzim nodejs binding and use it (***, need also
compilation and C++ skills)
* Evaluate necessary work to merge mwoffliner and new WMF PDF Renderer (***)

Kiwix - Wikipedia Offline & more
* Web:
* Twitter:
* more:

_______________________________________________ Wikimedia CH website
Wikimediach-l mailing list

Antwort per Email an