This is excellent! Thanks for this update. A.
On Sat, Mar 1, 2014 at 9:01 AM, Emmanuel Engelhart <[email protected]> wrote: > Hi > > For the first time, we have achieved to release a complete dump of all > encyclopedic articles of the Wikipedia in English, *with thumbnails*. > > This ZIM file is 40 GB big and contains the current 4.5 million articles > with their 3.5 millions pictures: > http://download.kiwix.org/zim/wikipedia_en_all.zim.torrent > > This ZIM file is directly and easily usable on many types of devices > like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian > with Wikionboard. > > You don't need modern computers with big CPUs. You can for example > create a (read-only) Wikipedia mirror on a RaspberryPi for ~100USD by > using our ZIM dedicated Web server called kiwix-serve. A demo is > available here: http://library.kiwix.org/wikipedia_en_all/ > > Like always, we also provide a packaged version (for the main PC > systems) which includes fulltext search index+ZIM file+binaries: > http://download.kiwix.org/portable/wikipedia_en_all.zip.torrent > > What is interesting too: This file was generated in less than 2 weeks > thanks to multiples recent innovations: > * The Parsoid (cluster), which gives us an HTML output with additional > semantic RDF tags > * mwoffliner, a nodejs script able to dumps pages based on the Mediawiki > API (and Parsoid API) > * zimwriterfs, a solution able to compile any local HTML directory to a > ZIM file > > We have now an efficient way to generate new ZIM files. Consequently, we > will work to industrialize and automatize the ZIM file generation > process, one thing which is probably the most oldest and important > problem we still face at Kiwix. > > All this would not have been possible without the support: > * Wikimedia CH and the "ZIM autobuild" project > * Wikimedia France and the Afripedia project > * Gwicke from the WMF Parsoid dev team. > > BTW, we need additional developer helps with javascript/nodejs skills to > fix a few issues on mwoffliner: > * Recreate the "table of content" based on the HTML DOM (*) > * Scrape Mediawiki Resourceloader in a manner it will continue to work > offline (***) > * Scrape categories (**) > * Localized the script (*) > * Improve the global performance by introducing usage of workers (**) > * Create nodezim, the libzim nodejs binding and use it (***, need also > compilation and C++ skills) > * Evaluate necessary work to merge mwoffliner and new WMF PDF Renderer > (***) > > Emmanuel > -- > Kiwix - Wikipedia Offline & more > * Web: http://www.kiwix.org > * Twitter: https://twitter.com/KiwixOffline > * more: http://www.kiwix.org/wiki/Communication > > _______________________________________________ > Offline-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/offline-l > -- Asaf Bartov <[email protected]>
_______________________________________________ Offline-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/offline-l
