Hi Michael,
Replies inline

On Sat, Nov 12, 2016 at 7:10 PM, <user-digest-h...@nutch.apache.org> wrote:

> From: Michael Coffey <mcof...@yahoo.com.invalid>
> To: "user@nutch.apache.org" <user@nutch.apache.org>
> Cc:
> Date: Sun, 13 Nov 2016 03:07:16 +0000 (UTC)
> Subject: How can I Score?
> When the generator is used with -topN, it is supposed to choose the
> highest-scoring urls.


Yes this is the threshold of how many top scoring URLs you wish to generate
into a new Fetch list and subsequently fetch. When you use the crawl
script, the -topN is calculated as follows

$numSlaves * 50000

By default, we assume that you are running on one machine (local mode)
therefore the numSlaves variable is set to 1.


> In my case, all the urls in my db have a score of zero, except the ones
> injected.
>

This is a bit strange. I would not expect them to have absolutely zero...
are you sure that it is not marginally above zero? Which scoring
plugin/mechanism are you currently using?


> How can I cause scores to be computed and stored?


Scores for each and every CrawlDatum are computed automatically
out-of-the-box.


> I am using the standard crawl script.


OK


> Do I need to enable the various webgraph lines in the script?
>
>
Not unless you wish to use the WebGraph scoring implementation...
Lewis


-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Reply via email to