Yes. I have two scripts. The first one is like a recrawl script and it does the following tasks:
inject generate fetch parse updatedb mergesegs invertlinks index dedup merge The second one just calls the solrindex command: bin/nutch solrindex mySolrUrl myDB myLink mySegments So, I'm indexing two times, the first one uses the lucene indexation (first script) and the second one uses the solr indexation. > Date: Tue, 2 Nov 2010 19:00:43 +0000 > Subject: Re: nutch solrindex doesn't index all the documents > From: [email protected] > To: [email protected] > > did you run the deduplication before indexing? > > On 2 November 2010 00:23, Juan Felix <[email protected]> wrote: > > > > > Hi. > > > > I'm trying to index all the documents using solrindex command, but for some > > reason sometimes it doesn't index all the documents. > > > > For example, I saw the crawl db stats and it has 75,031 fetched pages but > > after index them to solr, the number of documents in solr are 74,827 > > > > Any Idea? What about the other 204 pages that are not on solr? > > > > Thanks > > Juan Felix > > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com

