Sent: 15 March 2018 10:26
> To: user@nutch.apache.org
> Subject: RE: RE: Dependency between plugins
>
> Yes I am using Html parser and yes the document is getting parsed but
> document fragment is printing null.
>
> On 15 Mar 2018 13:52, "Yossi Tamari" <yossi.tam...@pi
eally recommend debugging in local mode rather than using sysout.
>
> > -Original Message-
> > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> > Sent: 15 March 2018 10:13
> > To: user@nutch.apache.org
> > Subject: RE: RE: Dependency between pl
r@nutch.apache.org
> Subject: RE: RE: Dependency between plugins
>
> I tried printing the contents of document fragment in parsefilter-regex by
> writing
> System.out.println(doc) but its printing null!! And document is getting
> parsed!!
>
> On 15 Mar 2018 13:15, &q
ent as their fourth parameter.
>
> > -Original Message-
> > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> > Sent: 15 March 2018 08:50
> > To: user@nutch.apache.org
> > Subject: Re: RE: Dependency between plugins
> >
> > Hi Jorge and Yos
Parse filters receive a DocumentFragment as their fourth parameter.
> -Original Message-
> From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> Sent: 15 March 2018 08:50
> To: user@nutch.apache.org
> Subject: Re: RE: Dependency between plugins
>
> Hi Jorge an
sage-
> > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> > Sent: 14 March 2018 15:28
> > To: user@nutch.apache.org
> > Subject: Re: RE: Dependency between plugins
> >
> > Is there a way in nutch by which we can use different parser for
> diff
nal Message-
> From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in>
> Sent: 14 March 2018 15:28
> To: user@nutch.apache.org
> Subject: Re: RE: Dependency between plugins
>
> Is there a way in nutch by which we can use different parser for different
> websites?
>
Is there any reason why writing a `HtmlParseFilter` would not be enough?
The HTML parser will execute its own logic and provide a DOM representation
to all the filters and you can extract your own data from the DOM tree.
At the moment individual parsers are matched by mimetype (see
Is there a way in nutch by which we can use different parser for different
websites?
I am trying to do this by writing a custom parser which will call different
parsers for different websites?
On 14 Mar 2018 14:19, "Semyon Semyonov" wrote:
> As a side note,
>
> I had
As a side note,
I had to implement my own parser with extra functionality, simple copy/past of
the code of HTMLparser did the job.
If you want to inherit instead of copy paste it can be a bad idea at all. HTML
parser is a concrete non abstract class, therefore the inheritance will not be
so
10 matches
Mail list logo