Hi, This is likely a URL filter problem. You can try the parsechecker tool (bin/nutch parsechecker <url>) to see how Nutch parses a page and what links it finds. This should be more or less the same as there are links on the page. Also, check your conf/regex-urlfilter, the missing URL's are likely filtered out by that plugin.
Also you may want to upgrade to the new 1.4, it comes with some important fixes and improvements. Cheers, > hi, i crawl one site that it has 100 link in depth 1, and 100 links in > depth 2, but nutch only crawl 23 links from depth 1 and 30 from depth 2. > how can i force nutch to crawl all links in depth 1 and 2. i use nutch 1.3 > topN=10000 > depth =2 > and in my nutch-site.xml: > <property> > <name>http.content.limit</name> > <value>-1</value> > <description> > </description> > </property> > <property> > <name>http.agent.name</name> > <value>My Nutch Spider</value> > <description> > </description> > </property> > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/error-in-topN-tp3601000p3601000.html > Sent from the Nutch - User mailing list archive at Nabble.com.

