Re: [Nutch-dev] How to index in real time?

Alan Tanaman Wed, 17 Jan 2007 07:22:56 -0800

Hi,

> As far as i know, you should rebuild the index every time or merge the 
> indexes, so there is not an online index building. Consider asking this 
> to lucene mailing list.


We are doing something similar to online (well, not sub-second but
sub-minute).  We are doing this by applying the adaptive-fetch patch, and
limiting the scope of each crawl so that we are only taking the items that
change.

As for the indexing, we are still using the existing mechanism, which
creates an entire index at once and then merges, but are planning to write a
patch to use Lucene-API to access the existing index:
- add the new documents
- delete existing documents/re-add them to refer new segment ids
- delete obsolete documents

You need to be aware that this is not the most efficient usage of Nutch, but
it should make it easier for use in an enterprise environment.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
http://blog.idna-solutions.com




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] How to index in real time?

Reply via email to