/DOMContentUtils.java
at private boolean getTextHelper(StringBuffer sb, Node node, boolean
abortOnNestedAnchors, int anchorDepth)
Semyon
Sent: Friday, November 16, 2018 at 10:34 AM
From: "Jorge Betancourt"
To: user@nutch.apache.org
Subject: Re: Block certain parts of HTML code from being
Protect our environment - please only print this if you have to!
> >
> >
> > -----Original Message-
> > From: Hany NASR
> > Sent: Thursday, November 15, 2018 4:18 PM
> > To: user@nutch.apache.org
> > Subject: RE: Block certain parts of HTML code from being i
_
> Protect our environment - please only print this if you have to!
>
>
> -Original Message-
> From: Hany NASR
> Sent: Thursday, November 15, 2018 4:18 PM
> To: user@nutch.apache.org
> Subject: RE: Block certain parts of HTML code fro
: Thursday, November 15, 2018 4:18 PM
To: user@nutch.apache.org
Subject: RE: Block certain parts of HTML code from being indexed
Hello Markus,
What if I want to remove specific component or page section?
Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT Corporate
Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: Wednesday, November 14, 2018 4:11 PM
To: user@nutch.apache.org
Subject: RE: Block certain parts of HTML code from being indexed
Hello Hany,
Using parse-tika as your HTML parser, you can enable Boilerpipe (see
nutch-default
Hello Hany,
Using parse-tika as your HTML parser, you can enable Boilerpipe (see
nutch-default).
Regards,
Markus
-Original message-
> From:hany.n...@hsbc.com
> Sent: Wednesday 14th November 2018 15:53
> To: user@nutch.apache.org
> Subject: Block certain parts of HT
@nutch.apache.org
> Subject: Block certain parts of HTML code from being indexed
>
> Hello All,
>
> I am using Nutch 1.15, and wondering if there is a feature for blocking
> certain
> parts of HTML code from being indexed (header & footer).
>
> Kind regards,
>
Hello All,
I am using Nutch 1.15, and wondering if there is a feature for blocking certain
parts of HTML code from being indexed (header & footer).
Kind regards,
Hany Shehata
Solutions Architect, Marketing and Communications IT
Corporate Functions | HSBC Operations, Services and Technology (HOST
8 matches
Mail list logo