There is no built-in mechanism for this. However, are you sure you really want
a parser for each website, rather than a parse-filter for each website (which
will take the results of the HTML parser and apply some domain specific
customizations)?
In both cases you can use a dispatcher approach,
Is there any reason why writing a `HtmlParseFilter` would not be enough?
The HTML parser will execute its own logic and provide a DOM representation
to all the filters and you can extract your own data from the DOM tree.
At the moment individual parsers are matched by mimetype (see
Is there a way in nutch by which we can use different parser for different
websites?
I am trying to do this by writing a custom parser which will call different
parsers for different websites?
On 14 Mar 2018 14:19, "Semyon Semyonov" wrote:
> As a side note,
>
> I had
As a side note,
I had to implement my own parser with extra functionality, simple copy/past of
the code of HTMLparser did the job.
If you want to inherit instead of copy paste it can be a bad idea at all. HTML
parser is a concrete non abstract class, therefore the inheritance will not be
so
One suggestion I can make is to ensure that the html-parse plugin is built
before your plugin (since you are including the jars that are generated in its
build).
> -Original Message-
> From: Yash Thenuan Thenuan
> Sent: 14 March 2018 09:55
> To:
Hi,
It didn't worked in ant runtime.
I included "import org.apache.nutch.parse.html;" in my custom parser code.
but it is throwing errror while i am doing ant runtime.
[javac]
Hi Yash,
I don't know how to do it, I never tried, but if I had to it would be a trial
and error thing
If you want to increase the chances that someone will answer your question, I
suggest you provide as much information as possible:
Where did it not work? In "ant runtime", or when running
Anybody please help me out regarding this.
On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
rit2014...@iiita.ac.in> wrote:
> I am trying to import Htmlparser in my custom parser.
> I did it in the same way by which Htmlparser imports lib-nekohtml but it
> didn't worked.
> Can anybody
8 matches
Mail list logo