Hi Karl. Here are what I understand.
Minimal crawl does not do the clean up phase. The clean up phase removes no-longer-reachable documents. Even when a link of a page is removed from the root page, minimal crawl is not supposed to remove the index of the no-longer-reachable page. If hop count is set to 1, then the no-longer-reachable page should not be affected because its hop count does not exceed 1. If I am correct above, then I do not understand why the index of the non-reachable page is deleted. 2014-12-24 13:59 GMT+09:00 Karl Wright <[email protected]>: > > Hi Shigeki, > > Minimal crawls do not guarantee that there is no document deletion. Such > crawls only do the least amount of work possible based on what model the > underlying connector implements. This often just means not doing the > "cleanup" phase at the end of the job run, which typically removes > no-longer-reachable documents. But if, for instance, you are using the web > connector and you have hop count filtering enabled, then the framework will > keep track of hop count and will remove all documents that exceed it, which > does not require the end-of-job cleanup phase. > > If your goal is to avoid removing any previously crawled documents, then I > am afraid that MCF does not have any real support for your model. "Start > minimal" is certainly not going to help you. > > Thanks, > karl > >
