nutch crawl everything

KRIS MUSSHORN Fri, 09 Sep 2016 11:21:03 -0700

Executing this does NOT index everything in and under seed.txt. 

./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TEST_CORE urls/ 
crawl -1


I have to run it multiple times to get all content. 

Is it possible related to this setting in nutch-site.xml? 

<property> 
<name>db.max.outlinks.per.page</name> 
<value>-1</value> 
<description> 
allow unlimited outlinks with -1 
</description> 
</property> 

Thx, 

Kris

nutch crawl everything

Reply via email to