Hi Manish, On Fri, Dec 25, 2015 at 1:43 PM, <[email protected]> wrote:
> > > Let me explain with example. > > Let’s say we have URL A and it is getting redirected to URL B , I see both > A and B getting indexed, I don’t want to index A when it’s redirecting to > another URL. > > Please pass the -deleteGone flag to the indexer job. The relevant code explaining which CrawlDatum status' this accounts for can be seen in the relevant snippet below. https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L235-L252 Thanks Lewis

