Hi nutch-users,

I would like to write a nutch plugin to parse each url and extract
different elements from the page (using something like jsoup parser) and
construct a json and write it to s3 (I am running my nutch cluster in AWS).
I am curious to know whether there is any existing plugin that can do some
of the work for me.

I do see an example of how to write a parser plugin over at
https://wiki.apache.org/nutch/WritingPluginExample-1.2
I am curious to hear from people who have tried a similar use case, to
learn from others experience.

Thanks
Srini

Reply via email to