Thanks for reply Markus, By doc count I mean document got indexed in solr this is also the same count which is shown as stats when indexing job completes (Indexer: 31125 indexed (add/update)
What is index-dummy and how to use this ? On Mon, Aug 8, 2016 at 1:16 PM, Markus Jelsma <[email protected]> wrote: > Hello - are you sure you are observing docCount and not maxDoc? I don't > remember having seen this kind of behaviour in the past years. > If it is docCount, then i'd recommend using index-dummy backend twice and > diffing their results so you can see which documents are emitted, or not, > between indexing jobs. > That would allow you to find the records that change between jobs. > > Also, you mention indexing the same CrawlDB but that is not just what you > index, the segments matter. If you can reproduce it with the same CrawlDB > and the same set of segments, unchanged, with index-dummy, it would be > helpful. If the problem is only reproducible with different sets of > segments, then there is no problem. > > Markus > > > > -----Original message----- > > From:mark mark <[email protected]> > > Sent: Monday 8th August 2016 19:39 > > To: [email protected] > > Subject: Indexing Same CrawlDB Result In Different Indexed Doc Count > > > > Hi All, > > > > I am using nutch 1.12 , observed indexing same crawlDB multiple times > > gives different indexed doc count. > > > > We indexing from crawlDB and noted the indexed doc count, then wiped all > > index from solr and indexed again, this time number of document indexed > > were less then before. > > > > I removed all our customized plugins but indexed doc count still varies > > and it's reproducible almost every time. > > > > Command I used for crawl > > ./crawl seedPath crawlDir -1 > > > > Command Used for Indexing to solr: > > ./nutch solrindex $SOLR_URL $CRAWLDB_PATH $CRAWLDB_DIR/segments/* -filter > > -normalize -deleteGone > > > > Please suggest. > > > > Thanks Mark > > >

