Hi,

> I see when page has JS redirects nutch create docs for both original and 
> redirected page.

Unluckily, this is even true for normal non-JS content-level redirects,
 see https://issues.apache.org/jira/browse/NUTCH-685
It's a bit more complex than HTTP redirects because the redirect status
is known only after parsing the document.

Sebastian

On 07/05/2016 09:52 PM, Manish Verma wrote:
> Hi,
> 
> Nutch 1.12 Url redirect scenario - 
> #1 Is there any way to skip original url from getting indexed ? I see when 
> page has JS redirects nutch create docs for both original and redirected page.
> #2 I see page  title is becoming part of page content, is it configurable to 
> exclude title from content ?
> 
> Regards,
> Manish Verma
> AML Search
> 
> 

Reply via email to