Re: [Offline-l] The Whole Wikipedia in English with pictures in one 40GB big file

Emmanuel Engelhart Sun, 02 Mar 2014 02:09:07 -0800

Le 02/03/2014 01:33, Samuel Klein a écrit :
> Brilliant.  Congrats to everyone who is working on this!
> What is needed to scrape categories?


0 - For all dumped pages (so at least NS_MAIN and NS_CATEGORY pages),
download the list of categories they belong to (with the MW API).
1 - For each dumped page, implement the HTML rendering of the category
list at the bottom.
2 - For each category page, get the content HTML rendering from Parsoid
and compute and render sorted lists of articles and sub-categories in a
similar fashion like the online version (with multiple pages if necessary).

All the stuff must be integrated in the nodejs script and category graph
must be stored in redis.

Emmanuel
-- 
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication

_______________________________________________
Offline-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/offline-l

Re: [Offline-l] The Whole Wikipedia in English with pictures in one 40GB big file

Reply via email to