Hey,
In the tutorial it says to only use a subset of the
DMOZ directory, however, for experimentation I went
ahead and used the whole thing. During the latter
parts of the tutorial it says this:

now we fetch a new segment with the top-scoring 1000
pages:

bin/nutch generate db segments -topN 1000
s2=`ls -d segments/2* | tail -1`
echo $s2

Is that top scoring 1000 related to the number of
pages? Since I have many thousands of pages, would it
be better if I increased this number?

Many Thanks.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to