RE: Document scores(boost)

Markus Jelsma Thu, 10 Sep 2015 07:37:07 -0700

Yes, remove OPIC from the config will simple disable it.

The webgraph program will create a webgraph datastructure for the specified 
segments. The linkrank program will then calculate the scores for each node. 
Finally, the scoreupdater writes the score from the webgraph back into the 
crawldb. This program is very intensive. Use it only if you really need it.


Markus
 
-----Original message-----
> From:Imtiaz Shakil Siddique <[email protected]>
> Sent: Thursday 10th September 2015 16:04
> To: [email protected]
> Subject: Re: Document scores(boost)
> 
> Hello Markus Jelsma,
> 
> So you are suggesting that I should
> 1. remove "scoring-opic" plugin
> 2. run the webgraph > linkrank > scoreupdater from /bin/crawl script
> if I want to calculate document boost with all segments in hand.
> 
> 
> It'd be very helpful if you could explain what these four things do ( 
> webgraph,
> linkrank, scoreupdater,nodedumper )
> 
> Thank you so much for the help.
> Imtiaz Shakil Siddique
> 
> 
> On 10 September 2015 at 19:27, Markus Jelsma <[email protected]>
> wrote:
> 
> > Hello - OPIC is useless in incremental crawls. You can either disable
> > scoring altogether, or use webgraph > linkrank > scoreupdater.
> > Markus
> >
> > -----Original message-----
> > > From:Imtiaz Shakil Siddique <[email protected]>
> > > Sent: Wednesday 9th September 2015 23:09
> > > To: [email protected]
> > > Subject: Document scores(boost)
> > >
> > > Hello,
> > > I've been using nutch 1.9/1.10 for about six months. One thing I noticed
> > > that at each iteration(during parsing phase) nutch calculates document
> > > boost(using Opic algorithm)
> > >
> > > 1. My question is how this score is adjusted with respect to all the
> > > segments.
> > >
> > > 2. Another question is inside bin/crawl script what does the webgraph,
> > > linkrank, scoreupdater,nodedumper do? Can anyone be kind enough to
> > explain?
> > >
> > > Thank you so much.
> > > Imtiaz Shakil Siddique
> > >
> >
>

RE: Document scores(boost)

Reply via email to