We also have plans to make a quick indexer, but we have not got around to it yet. The trick is to simply call the indexing code for a page when it is parsed. (This can even happen during fetch, so this will combine a fetch-parse-index in a a single step). The tradeoff is that some information might not be available yet, such as inlink information.
Keep an eye on the Jira list for a possible implementation. (Or try yourself if you are into Nutch hacking). On Mon, Jul 2, 2012 at 5:20 AM, 何建云 <[email protected]> wrote: > Hi, > I am using nutch for a search engine. I can not index webpages until the > entire crawling process has ended. But i would like a quick update > operation. The data crawled in front of several can be added to the index > even if the entire crawl process is not over yet. > 1. Have any good idea? > 2. If i do the indexing operation after every crawl depth, it will waste a > lot of time. Beause the current solution is rebuilding the whole index. Is > it possible to index incrementally? > thanks.

