Hi Manish,

On Fri, Dec 25, 2015 at 1:43 PM, <[email protected]> wrote:

>
>
> Let me explain with example.
>
> Let’s say we have URL A and it is getting redirected to URL B , I see both
> A and B getting indexed, I don’t want to index A when it’s redirecting to
> another URL.
>
>
Please pass the -deleteGone flag to the indexer job.
The relevant code explaining which CrawlDatum status' this accounts for can
be seen in the relevant snippet below.

https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java#L235-L252

Thanks
Lewis

Reply via email to