Hi Gideon,

On Sun, Feb 14, 2016 at 1:50 AM, <[email protected]> wrote:

> Subject: Extracting title description and keywords from a fetched URL
> Hi everyone,
>
> I'm trying to crawl several websites and extract only their title, keyword
> and description (and nothing else)
> I saw several examples on how to do that.
> However they all propose complicated (at least to a Nutch newbie) plugins
> configuration and settings Since my use case sounds like a very common one
> I was wondering if there is any simpler solution?
> If there is no easier solution, can anyone at least explain what are the
> steps required for me to extract just these specific tags?
>
> Thanks in advance
>
>
Although it seems complicated to implement a Nutch plugin it is actually
not so bad. The entire plugin system is documented at [0] with a tutorial
for writing a plugin provided at [1].
If you have any further issues with your implementation please let us know
and we will try to help.
Thanks
Lewis

[0] http://wiki.apache.org/nutch/PluginCentral
[1] http://wiki.apache.org/nutch/WritingPluginExample

Reply via email to