Hi, > I see when page has JS redirects nutch create docs for both original and > redirected page.
Unluckily, this is even true for normal non-JS content-level redirects, see https://issues.apache.org/jira/browse/NUTCH-685 It's a bit more complex than HTTP redirects because the redirect status is known only after parsing the document. Sebastian On 07/05/2016 09:52 PM, Manish Verma wrote: > Hi, > > Nutch 1.12 Url redirect scenario - > #1 Is there any way to skip original url from getting indexed ? I see when > page has JS redirects nutch create docs for both original and redirected page. > #2 I see page title is becoming part of page content, is it configurable to > exclude title from content ? > > Regards, > Manish Verma > AML Search > >

