This looks odd. From what i know, the successfully parsed documents are sent to Solr. Did you check the logs for any exceptions ?
What command are you using to index ? On Thu, Feb 28, 2013 at 1:51 PM, Amit Sela <[email protected]> wrote: > Hi everyone, > > I'm running with nutch 1.6 and Solr 3.6.2. > I'm trying to crawl only the seed list (depth 1) and it seems that the > process ends with only ~255 of the URLs indexed in Solr. > > Seed list is about 120K. > Fetcher map input is 117K where success is 62K and temp_moved 45K. > Parse shows success of 62K. > CrawlDB after the fetch shows db_redir_perm=56K, db_unfetched=27K > and db_fetched=22K. > > And finally IndexerStatus shows 20K documents added. > What am I missing ? > > Thanks! > > my nutch-site.xml includes: > ----------------------------------------- > <name>plugin.includes</name> > > <value>protocol-httpclient|urlfilter-regex|parse-(text|html|tika|metatags|js)|index-(basic|anchor|metadata)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)i</value> > <name>metatags.names</name> > <value>keywords;Keywords;description;Description</value> > <name>index.parse.md</name> > > <value>metatag.keywords,metatag.Keywords,metatag.description,metatag.Description</value> > <name>db.update.additions.allowed</name> > <value>false</value> > <name>generate.count.mode</name> > <value>domain</value> > <name>partition.url.mode</name> > <value>byDomain</value> > <name>file.content.limit</name> > <value>262144</value> > <name>http.content.limit</name> > <value>262144</value> > <name>parse.filter.urls</name> > <value>true</value> > <name>parse.normalize.urls</name> > <value>true</value> > -- Kiran Chitturi

