Read about OPIC scoring. It can be confusing indeed. I would not recommend using OPIC for incremental crawls where you refetch pages over time.
> Ha! but out of curiosity, why is the average score so low out of 1.0? that > seems pretty darned weak, whatever it is. > > > TOTAL urls: 1241 > retry 0: 1241 > min score: 0.0 > avg score: 0.0049016923 > max score: 1.0 > status 1 (db_unfetched): 1001 > status 2 (db_fetched): 224 > status 3 (db_gone): 15 > status 5 (db_redir_perm): > > On Thu, Sep 22, 2011 at 14:03, Markus Jelsma <[email protected]>wrote: > > That is not neccessary. At most you would delete the failed segment or > > delete > > all segment dirs except crawl_generate (or was it fetch_generate) so you > > can > > restart the fetch from the beginning. > > > > > > What do you use? The crawl command? I don't see any evidence of you > > updating > > the DB ;). Anyway, never kill a running job unless you really have to. It > > cannot be resumed. > > > > > I had to delete the contents of the crawldb folder to recover from a > > > failed fetch (was this the best response? i doubt it). now I have a > > > > fetch > > > > > running, successfully, but i don't see any evidence that is writing > > > anything to crawldb. Is it going to write all the crawldb stuff at the > > > end, or should I go ahead and kill the crawl now?

