You don't need to install scoring-depth. It's just that the term 'depth' in
the old crawl class has been replaced by 'rounds', which is more accurate.

The equivalent of the command you used to call should be
*bin/crawl phfaws crawl **1 *

The value for topN needs setting in the crawl scrip, see sizeFetchlist in [
https://github.com/apache/nutch/blob/master/src/bin/crawl#L117]

HTH

Julien

On 31 January 2017 at 16:49, Chip Calhoun <[email protected]> wrote:

> I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my
> seeds, so my 1.4 command was:
> bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000
>
> My understanding is that the "crawl" command is deprecated, "-depth" went
> with it, and I need to install the scoring-depth plugin. I'm new to adding
> plugins. The instructions at https://wiki.apache.org/nutch/AboutPlugins
> give a sample command, but I don't know what the official PluginRepository
> for this plugin is and the sample link for the HtmlParser plugin is dead.
>
> I'll appreciate any help. Thank you!
>
> Chip Calhoun
> Digital Archivist
> Niels Bohr Library & Archives
> American Institute of Physics
> One Physics Ellipse
> College Park, MD  20740
> 301-209-3180
> https://www.aip.org/history-programs/niels-bohr-library
>
>


-- 

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>

Reply via email to