Hi Michael, Replies inline On Sat, Nov 12, 2016 at 7:10 PM, <user-digest-h...@nutch.apache.org> wrote:
> From: Michael Coffey <mcof...@yahoo.com.invalid> > To: "user@nutch.apache.org" <user@nutch.apache.org> > Cc: > Date: Sun, 13 Nov 2016 03:07:16 +0000 (UTC) > Subject: How can I Score? > When the generator is used with -topN, it is supposed to choose the > highest-scoring urls. Yes this is the threshold of how many top scoring URLs you wish to generate into a new Fetch list and subsequently fetch. When you use the crawl script, the -topN is calculated as follows $numSlaves * 50000 By default, we assume that you are running on one machine (local mode) therefore the numSlaves variable is set to 1. > In my case, all the urls in my db have a score of zero, except the ones > injected. > This is a bit strange. I would not expect them to have absolutely zero... are you sure that it is not marginally above zero? Which scoring plugin/mechanism are you currently using? > How can I cause scores to be computed and stored? Scores for each and every CrawlDatum are computed automatically out-of-the-box. > I am using the standard crawl script. OK > Do I need to enable the various webgraph lines in the script? > > Not unless you wish to use the WebGraph scoring implementation... Lewis -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney