I'd like to have nutch index only contents within certain metatags;
essentially, contents that matter would appear inside a <content> tag in
html format.  I am thinking of adding a htmlfilter to filter out the content
tag, but I would also need to augment the nutch Document to erase everything
that are non <content> - is that right?  thanks!

Reply via email to