We also have plans to make a quick indexer, but we have not got around to
it yet. The trick is to simply call the indexing code for a page when it is
parsed. (This can even happen during fetch, so this will combine a
fetch-parse-index in a a single step).  The tradeoff is that some
information might not be available yet, such as inlink information.

Keep an eye on the Jira list for a possible implementation. (Or try
yourself if you are into Nutch hacking).

On Mon, Jul 2, 2012 at 5:20 AM, 何建云 <[email protected]> wrote:

> Hi,
> I am using nutch for a search engine. I can not index webpages until the
> entire crawling process has ended. But i would like a quick update
> operation. The data crawled in front of several  can be added to the index
> even if the entire crawl process is not over yet.
> 1. Have any good idea?
> 2. If i do the indexing operation after every crawl depth, it will waste a
> lot of time. Beause the current solution is rebuilding the whole index. Is
> it possible to index incrementally?
> thanks.

Reply via email to