Hi Gideon, On Sun, Feb 14, 2016 at 1:50 AM, <[email protected]> wrote:
> Subject: Extracting title description and keywords from a fetched URL > Hi everyone, > > I'm trying to crawl several websites and extract only their title, keyword > and description (and nothing else) > I saw several examples on how to do that. > However they all propose complicated (at least to a Nutch newbie) plugins > configuration and settings Since my use case sounds like a very common one > I was wondering if there is any simpler solution? > If there is no easier solution, can anyone at least explain what are the > steps required for me to extract just these specific tags? > > Thanks in advance > > Although it seems complicated to implement a Nutch plugin it is actually not so bad. The entire plugin system is documented at [0] with a tutorial for writing a plugin provided at [1]. If you have any further issues with your implementation please let us know and we will try to help. Thanks Lewis [0] http://wiki.apache.org/nutch/PluginCentral [1] http://wiki.apache.org/nutch/WritingPluginExample

