I think the http://wiki.apache.org/nutch/WritingPluginExample tutorial shows
how to implement the Filter - you would be filtering the 'content' metatag
instead of the 'recommended'. Then it is up to you what other Filters you
enable/disable. Also look at the
org.creativecommons.nutch.CCDeleteUnlicensedTool for an example of deleting
Pages from the Index missing certain Fields.

Rgrds, Thomas

On 2/13/06, Sunnyvale Fl <[EMAIL PROTECTED]> wrote:
>
> I'd like to have nutch index only contents within certain metatags;
> essentially, contents that matter would appear inside a <content> tag in
> html format.  I am thinking of adding a htmlfilter to filter out the
> content
> tag, but I would also need to augment the nutch Document to erase
> everything
> that are non <content> - is that right?  thanks!
>
>

Reply via email to