/DOMContentUtils.java
at private boolean getTextHelper(StringBuffer sb, Node node, boolean
abortOnNestedAnchors, int anchorDepth)
Semyon
Sent: Friday, November 16, 2018 at 10:34 AM
From: "Jorge Betancourt"
To: user@nutch.apache.org
Subject: Re: Block certain parts of HTML code from being
ur environment - please only print this if you have to!
> >
> >
> > -Original Message-
> > From: Hany NASR
> > Sent: Thursday, November 15, 2018 4:18 PM
> > To: user@nutch.apache.org
> > Subject: RE: Block certain parts of HTML code from being indexed
: Thursday, November 15, 2018 4:18 PM
To: user@nutch.apache.org
Subject: RE: Block certain parts of HTML code from being indexed
Hello Markus,
What if I want to remove specific component or page section?
Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate
Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, November 14, 2018 4:11 PM
To: user@nutch.apache.org
Subject: RE: Block certain parts of HTML code from being indexed
Hello Hany,
Using parse-tika as your HTML parser, you can enable Boilerpipe (see
nutch-default
Hello Hany,
Using parse-tika as your HTML parser, you can enable Boilerpipe (see
nutch-default).
Regards,
Markus
-Original message-
> From:hany.n...@hsbc.com
> Sent: Wednesday 14th November 2018 15:53
> To: user@nutch.apache.org
> Subject: Block certain parts of HTML code from
Hi Hany,
The Tika parser supports Boilerpipe for header and footer removal, but I don't
know how well it works.
You can test it online at https://boilerpipe-web.appspot.com/
> -Original Message-
> From: hany.n...@hsbc.com
> Sent: 14 November 2018 16:53
> To: user@nutch.apache.org
>
6 matches
Mail list logo