I'd like to have nutch index only contents within certain metatags; essentially, contents that matter would appear inside a <content> tag in html format. I am thinking of adding a htmlfilter to filter out the content tag, but I would also need to augment the nutch Document to erase everything that are non <content> - is that right? thanks!
- [Nutch-general] index content within metatag only Sunnyvale Fl
- [Nutch-general] Re: index content within metatag only Thomas Delnoij
