Hello Hany, Using parse-tika as your HTML parser, you can enable Boilerpipe (see nutch-default).
Regards, Markus -----Original message----- > From:hany.n...@hsbc.com <hany.n...@hsbc.com> > Sent: Wednesday 14th November 2018 15:53 > To: user@nutch.apache.org > Subject: Block certain parts of HTML code from being indexed > > Hello All, > > I am using Nutch 1.15, and wondering if there is a feature for blocking > certain parts of HTML code from being indexed (header & footer). > > Kind regards, > Hany Shehata > Solutions Architect, Marketing and Communications IT > Corporate Functions | HSBC Operations, Services and Technology (HOST) > ul. Kapelanka 42A, 30-347 Kraków, Poland > __________________________________________________________________ > > Tie line: 7148 7689 4698 > External: +48 123 42 0698 > Mobile: +48 723 680 278 > E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com> > __________________________________________________________________ > Protect our environment - please only print this if you have to! > > > > ----------------------------------------- > SAVE PAPER - THINK BEFORE YOU PRINT! > > This E-mail is confidential. > > It may also be legally privileged. If you are not the addressee you may not > copy, > forward, disclose or use any part of it. If you have received this message in > error, > please delete it and all copies from your system and notify the sender > immediately by > return E-mail. > > Internet communications cannot be guaranteed to be timely secure, error or > virus-free. > The sender does not accept liability for any errors or omissions. >