Hi,

Try changing the properties related to max outlinks in the nutch-default.xml. 
That should help.

Cheers,
Chris

On Dec 19, 2011, at 2:49 PM, tahere ganjiyar wrote:

> hi, i crawl one site that it has 100 link in depth 1, and 100 links in depth 
> 2, but nutch only crawl 23 links from depth 1 and 30 from depth 2. how can i 
> force nutch to crawl all links in depth 1 and 2. i use nutch 1.3 
> topN=10000
> depth =2
> and in my nutch-site.xml:
> <property>
>         <name>http.content.limit</name>
>         <value>-1</value>
>         <description>
>   </description>
>     </property>
>  <property>
>         <name>http.agent.name</name>
>         <value>My Nutch Spider</value>
>         <description>
>   </description>
>     </property>
> 
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  • Re: topN-help Mattmann, Chris A (388J)

Reply via email to