Hey, In the tutorial it says to only use a subset of the DMOZ directory, however, for experimentation I went ahead and used the whole thing. During the latter parts of the tutorial it says this:
now we fetch a new segment with the top-scoring 1000 pages: bin/nutch generate db segments -topN 1000 s2=`ls -d segments/2* | tail -1` echo $s2 Is that top scoring 1000 related to the number of pages? Since I have many thousands of pages, would it be better if I increased this number? Many Thanks. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
