Le 02/03/2014 01:33, Samuel Klein a écrit : > Brilliant. Congrats to everyone who is working on this! > What is needed to scrape categories?
0 - For all dumped pages (so at least NS_MAIN and NS_CATEGORY pages), download the list of categories they belong to (with the MW API). 1 - For each dumped page, implement the HTML rendering of the category list at the bottom. 2 - For each category page, get the content HTML rendering from Parsoid and compute and render sorted lists of articles and sub-categories in a similar fashion like the online version (with multiple pages if necessary). All the stuff must be integrated in the nodejs script and category graph must be stored in redis. Emmanuel -- Kiwix - Wikipedia Offline & more * Web: http://www.kiwix.org * Twitter: https://twitter.com/KiwixOffline * more: http://www.kiwix.org/wiki/Communication _______________________________________________ Offline-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/offline-l
