We think out next work scheme for incremental crawling:

 

1.      Depth =1, topN = big enough (for example 100000)
2.      clear partial indexes from previous iteration
3.      copy global index to indexes
4.      crawl new segment
5.      create index for new segment
6.      deldup (working for total index and new)
7.      merge old total and new index into new global index
8.      replace old total index with new total index

 

May be there is ways to optimize this scheme?

May be there is way to append new indexes into total without copying it?

 

All this need for use search engine while new crawl process in progress.

Therefore total index must be accessible all the time, or at least
minimize time of inaccessibility.

Reply via email to