Hi Tom, Nice article!
Tom White wrote:
Hi, I've written an article about using Nutch at the intranet scale, which you may find interesting: http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html .
I found it very enlightening. In the following installments I'd personally like to learn about
* How to keep the index up to date - nutch makes it simple to crawl the intranet and then you start tomcat and you are flying, but what then? Whats the best way to keep the search db fresh, ie. revisit the existing pages and crawl new links. * How to use the parse-ext module - to parse stuff on your intranet not supported by the existing parsers
* How to customize the web-interface
Please post any comments on the article page itself.
I have to register to do that!
