Hi suyash,
This issue can be addressed by essentially, commenting OUT all of the
instances where the WebPage [0] object is augmented within each job (and
possibly plugin).
An example would be as follows
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/parse/ParseUtil.java#L358
You need to step through the entire codebase and essentially comment out
setting (and maybe getting) values from the WebPage object.
The alternative option, is to simply create a new WebPage schema with only
the outlinks data structure, then use the 'ant generate-gora-src' target to
recompile the Webpage Class.
https://github.com/apache/nutch/blob/2.x/build.xml#L612-L623
You can then attempt to recompile the project and address each compile
error sequentially until all you have remaining is code pertaining to
outlinks.
hth
Lewis
[0]
https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/storage/WebPage.java
On Thu, Mar 16, 2017 at 2:45 AM, wrote:
>
> From: suyash singh
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Tue, 14 Mar 2017 01:30:49 +0530
> Subject: Re: extract elements from each url as json and write it to s3
> Hi,
> I think you have to take database like mongodb. Write your custom gora
> mongodb mapping.xml and pass your Jason object to this.
>
> Thanks,
> suyash
>
>