RE: Document scores(boost)

Imtiaz Shakil Siddique Thu, 10 Sep 2015 10:12:27 -0700

Hello Markus Jelsma,

Thank you for the advice. But this score calculation is done after the data
is indexed to solr. So when the scores are updated inside the crawldb Solr
won't get it.


I think a workaround for this problem would be shifting the solr index
phase at the bottom of all the operations.
But one thing I'm not clear is that how often should I run this webgraph
update commands .

Thank you,
Imtiaz Shakil Siddique
On Sep 10, 2015 8:36 PM, "Markus Jelsma" <[email protected]> wrote:

> Yes, remove OPIC from the config will simple disable it.
>
> The webgraph program will create a webgraph datastructure for the
> specified segments. The linkrank program will then calculate the scores for
> each node. Finally, the scoreupdater writes the score from the webgraph
> back into the crawldb. This program is very intensive. Use it only if you
> really need it.
>
> Markus
>
> -----Original message-----
> > From:Imtiaz Shakil Siddique <[email protected]>
> > Sent: Thursday 10th September 2015 16:04
> > To: [email protected]
> > Subject: Re: Document scores(boost)
> >
> > Hello Markus Jelsma,
> >
> > So you are suggesting that I should
> > 1. remove "scoring-opic" plugin
> > 2. run the webgraph > linkrank > scoreupdater from /bin/crawl script
> > if I want to calculate document boost with all segments in hand.
> >
> >
> > It'd be very helpful if you could explain what these four things do (
> webgraph,
> > linkrank, scoreupdater,nodedumper )
> >
> > Thank you so much for the help.
> > Imtiaz Shakil Siddique
> >
> >
> > On 10 September 2015 at 19:27, Markus Jelsma <[email protected]
> >
> > wrote:
> >
> > > Hello - OPIC is useless in incremental crawls. You can either disable
> > > scoring altogether, or use webgraph > linkrank > scoreupdater.
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Imtiaz Shakil Siddique <[email protected]>
> > > > Sent: Wednesday 9th September 2015 23:09
> > > > To: [email protected]
> > > > Subject: Document scores(boost)
> > > >
> > > > Hello,
> > > > I've been using nutch 1.9/1.10 for about six months. One thing I
> noticed
> > > > that at each iteration(during parsing phase) nutch calculates
> document
> > > > boost(using Opic algorithm)
> > > >
> > > > 1. My question is how this score is adjusted with respect to all the
> > > > segments.
> > > >
> > > > 2. Another question is inside bin/crawl script what does the
> webgraph,
> > > > linkrank, scoreupdater,nodedumper do? Can anyone be kind enough to
> > > explain?
> > > >
> > > > Thank you so much.
> > > > Imtiaz Shakil Siddique
> > > >
> > >
> >
>

RE: Document scores(boost)

Reply via email to