Re: crawl always stops at depth=3

nutchcase Wed, 21 Oct 2009 12:26:17 -0700

Here is the output from that:
TOTAL urls:     297
retry 0:        297
min score:      0.0
avg score:      0.023377104
max score:      2.009
status 2 (db_fetched):  295
status 5 (db_redir_perm):



reinhard schwab wrote:
> 
> try
> 
> bin/nutch readdb crawl/crawldb -stats
> 
> are there any unfetched pages?
> 
> nutchcase schrieb:
>> My crawl always stops at depth=3. It gets documents but does not continue
>> any
>> further.
>> Here is my nutch-site.xml
>> <?xml version="1.0"?>
>> <configuration>
>> <property>
>> <name>http.agent.name</name>
>> <value>nutch-solr-integration</value>
>> </property>
>> <property>
>> <name>generate.max.per.host</name>
>> <value>1000</value>
>> </property>
>> <property>
>> <name>plugin.includes</name>
>> <value>protocol-http|urlfilter-(crawl|regex)|parse-html|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnorma\
>> lizer-(pass|regex|basic)</value>
>> </property>
>> <property>
>> <name>db.max.outlinks.per.page</name>
>> <value>1000</value>
>> </property>
>> </configuration>
>>
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/crawl-always-stops-at-depth%3D3-tp25981603p25998652.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: crawl always stops at depth=3

Reply via email to