Hi everyone, I'm running with nutch 1.6 and Solr 3.6.2. I'm trying to crawl only the seed list (depth 1) and it seems that the process ends with only ~255 of the URLs indexed in Solr.
Seed list is about 120K. Fetcher map input is 117K where success is 62K and temp_moved 45K. Parse shows success of 62K. CrawlDB after the fetch shows db_redir_perm=56K, db_unfetched=27K and db_fetched=22K. And finally IndexerStatus shows 20K documents added. What am I missing ? Thanks! my nutch-site.xml includes: ----------------------------------------- <name>plugin.includes</name> <value>protocol-httpclient|urlfilter-regex|parse-(text|html|tika|metatags|js)|index-(basic|anchor|metadata)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)i</value> <name>metatags.names</name> <value>keywords;Keywords;description;Description</value> <name>index.parse.md</name> <value>metatag.keywords,metatag.Keywords,metatag.description,metatag.Description</value> <name>db.update.additions.allowed</name> <value>false</value> <name>generate.count.mode</name> <value>domain</value> <name>partition.url.mode</name> <value>byDomain</value> <name>file.content.limit</name> <value>262144</value> <name>http.content.limit</name> <value>262144</value> <name>parse.filter.urls</name> <value>true</value> <name>parse.normalize.urls</name> <value>true</value>

