[Nutch-general] Injecting single URLs to an index

Robert Young Mon, 09 Jul 2007 01:58:10 -0700

I asked a similar question last week but I don't think I explained
myself properly. I have created a nutch / lucene index using the
normal crawl, merge, dedup process. The problem I am having is that
this whole process takes a long time, I would like to be able to
inject single urls and have them appear in the search very quickly
without having to rebuild the whole index (triggered by documents
being changed for example.) How can this be done?


I have been trying to do the following without success.

1. Crawl and index the new url.
2. Copy the live index
3. Dedup against the live copy
4. Merge with the live copy
5. Replace the live index with the new index

The process seems to work apart from step 3, I cannot seem to dedup a
previously merged index against an unmerged one. I imagine I am
looking at the problem from completely the wrong direction.

Cheers
Rob

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Injecting single URLs to an index

Reply via email to