RE: RE: Dependency between plugins

2018-03-15 Thread Yossi Tamari
Sent: 15 March 2018 10:26 > To: user@nutch.apache.org > Subject: RE: RE: Dependency between plugins > > Yes I am using Html parser and yes the document is getting parsed but > document fragment is printing null. > > On 15 Mar 2018 13:52, "Yossi Tamari" <yossi.tam...@pi

RE: RE: Dependency between plugins

2018-03-15 Thread Yash Thenuan Thenuan
eally recommend debugging in local mode rather than using sysout. > > > -Original Message- > > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > > Sent: 15 March 2018 10:13 > > To: user@nutch.apache.org > > Subject: RE: RE: Dependency between pl

RE: RE: Dependency between plugins

2018-03-15 Thread Yossi Tamari
r@nutch.apache.org > Subject: RE: RE: Dependency between plugins > > I tried printing the contents of document fragment in parsefilter-regex by > writing > System.out.println(doc) but its printing null!! And document is getting > parsed!! > > On 15 Mar 2018 13:15, &q

RE: RE: Dependency between plugins

2018-03-15 Thread Yash Thenuan Thenuan
ent as their fourth parameter. > > > -Original Message- > > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > > Sent: 15 March 2018 08:50 > > To: user@nutch.apache.org > > Subject: Re: RE: Dependency between plugins > > > > Hi Jorge and Yos

RE: RE: Dependency between plugins

2018-03-15 Thread Yossi Tamari
Parse filters receive a DocumentFragment as their fourth parameter. > -Original Message- > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > Sent: 15 March 2018 08:50 > To: user@nutch.apache.org > Subject: Re: RE: Dependency between plugins > > Hi Jorge an

Re: RE: Dependency between plugins

2018-03-15 Thread Yash Thenuan Thenuan
sage- > > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > > Sent: 14 March 2018 15:28 > > To: user@nutch.apache.org > > Subject: Re: RE: Dependency between plugins > > > > Is there a way in nutch by which we can use different parser for > diff

RE: RE: Dependency between plugins

2018-03-14 Thread Yossi Tamari
nal Message- > From: Yash Thenuan Thenuan <rit2014...@iiita.ac.in> > Sent: 14 March 2018 15:28 > To: user@nutch.apache.org > Subject: Re: RE: Dependency between plugins > > Is there a way in nutch by which we can use different parser for different > websites? >

Re: RE: Dependency between plugins

2018-03-14 Thread Jorge Betancourt
Is there any reason why writing a `HtmlParseFilter` would not be enough? The HTML parser will execute its own logic and provide a DOM representation to all the filters and you can extract your own data from the DOM tree. At the moment individual parsers are matched by mimetype (see

Re: RE: Dependency between plugins

2018-03-14 Thread Yash Thenuan Thenuan
Is there a way in nutch by which we can use different parser for different websites? I am trying to do this by writing a custom parser which will call different parsers for different websites? On 14 Mar 2018 14:19, "Semyon Semyonov" wrote: > As a side note, > > I had

Re: RE: Dependency between plugins

2018-03-14 Thread Semyon Semyonov
As a side note, I had to implement my own parser with extra functionality, simple copy/past of the code of HTMLparser did the job. If you want to inherit instead of copy paste it can be a bad idea at all. HTML parser is a concrete non abstract class, therefore the inheritance will not be so