mos wrote:
The problem at www.gildemeister.com is the use of JavaScript for link generation. That's the reason why nutch can't find the other pages (the links are invisible). Two ideas: - You need something like a sitemap, that links the other main pages. If it's not available right now, you should try to generate it (e.g. use the apache log-file) - Enhance the nutch html parser and make it able to intepret the JavaScipt links
You can try activating parse-js - it can extract JavaScript snippets embedded in HTML actions, and figure out the links. It works reasonably well, at least most of the time... ;-)
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
