Hi suyash, This issue can be addressed by essentially, commenting OUT all of the instances where the WebPage [0] object is augmented within each job (and possibly plugin). An example would be as follows https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/parse/ParseUtil.java#L358 You need to step through the entire codebase and essentially comment out setting (and maybe getting) values from the WebPage object. The alternative option, is to simply create a new WebPage schema with only the outlinks data structure, then use the 'ant generate-gora-src' target to recompile the Webpage Class. https://github.com/apache/nutch/blob/2.x/build.xml#L612-L623 You can then attempt to recompile the project and address each compile error sequentially until all you have remaining is code pertaining to outlinks. hth Lewis
[0] https://github.com/apache/nutch/blob/2.x/src/java/org/apache/nutch/storage/WebPage.java On Thu, Mar 16, 2017 at 2:45 AM, <user-digest-h...@nutch.apache.org> wrote: > > From: suyash singh <suyashsingh91...@gmail.com> > To: user@nutch.apache.org > Cc: > Bcc: > Date: Tue, 14 Mar 2017 01:30:49 +0530 > Subject: Re: extract elements from each url as json and write it to s3 > Hi, > I think you have to take database like mongodb. Write your custom gora > mongodb mapping.xml and pass your Jason object to this. > > Thanks, > suyash > >