Re: Block certain parts of HTML code from being indexed

2018-11-16 Thread Semyon Semyonov
/DOMContentUtils.java at private boolean getTextHelper(StringBuffer sb, Node node, boolean abortOnNestedAnchors, int anchorDepth)  Semyon Sent: Friday, November 16, 2018 at 10:34 AM From: "Jorge Betancourt" To: user@nutch.apache.org Subject: Re: Block certain parts of HTML code from being

Re: Block certain parts of HTML code from being indexed

2018-11-16 Thread Jorge Betancourt
ur environment - please only print this if you have to! > > > > > > -Original Message- > > From: Hany NASR > > Sent: Thursday, November 15, 2018 4:18 PM > > To: user@nutch.apache.org > > Subject: RE: Block certain parts of HTML code from being indexed

RE: Block certain parts of HTML code from being indexed

2018-11-16 Thread hany . nasr
: Thursday, November 15, 2018 4:18 PM To: user@nutch.apache.org Subject: RE: Block certain parts of HTML code from being indexed Hello Markus, What if I want to remove specific component or page section? Kind regards, Hany Shehata Solutions Architect, Marketing and Communications IT Corporate

RE: Block certain parts of HTML code from being indexed

2018-11-15 Thread hany . nasr
Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, November 14, 2018 4:11 PM To: user@nutch.apache.org Subject: RE: Block certain parts of HTML code from being indexed Hello Hany, Using parse-tika as your HTML parser, you can enable Boilerpipe (see nutch-default

RE: Block certain parts of HTML code from being indexed

2018-11-14 Thread Markus Jelsma
Hello Hany, Using parse-tika as your HTML parser, you can enable Boilerpipe (see nutch-default). Regards, Markus -Original message- > From:hany.n...@hsbc.com > Sent: Wednesday 14th November 2018 15:53 > To: user@nutch.apache.org > Subject: Block certain parts of HTML code from

RE: Block certain parts of HTML code from being indexed

2018-11-14 Thread Yossi Tamari
Hi Hany, The Tika parser supports Boilerpipe for header and footer removal, but I don't know how well it works. You can test it online at https://boilerpipe-web.appspot.com/ > -Original Message- > From: hany.n...@hsbc.com > Sent: 14 November 2018 16:53 > To: user@nutch.apache.org >