Read about OPIC scoring. It can be confusing indeed. I would not recommend 
using OPIC for incremental crawls where you refetch pages over time.

> Ha! but out of curiosity, why is the average score so low out of 1.0? that
> seems pretty darned weak, whatever it is.
> 
> 
> TOTAL urls:     1241
> retry 0:        1241
> min score:      0.0
> avg score:      0.0049016923
> max score:      1.0
> status 1 (db_unfetched):        1001
> status 2 (db_fetched):  224
> status 3 (db_gone):     15
> status 5 (db_redir_perm):
> 
> On Thu, Sep 22, 2011 at 14:03, Markus Jelsma 
<[email protected]>wrote:
> > That is not neccessary. At most you would delete the failed segment or
> > delete
> > all segment dirs except crawl_generate (or was it fetch_generate) so you
> > can
> > restart the fetch from the beginning.
> > 
> > 
> > What do you use? The crawl command? I don't see any evidence of you
> > updating
> > the DB ;). Anyway, never kill a running job unless you really have to. It
> > cannot be resumed.
> > 
> > > I had to delete the contents of the  crawldb folder to recover from a
> > > failed fetch (was this the best response? i doubt it).  now I have a
> > 
> > fetch
> > 
> > > running, successfully, but i don't see any evidence that is writing
> > > anything to crawldb.  Is it going to write all the crawldb stuff at the
> > > end, or should I go ahead and kill the crawl now?

Reply via email to