Hi, I'd like to use nutch to index intranet/site content. The content is all template-based, and I'd like to index only a portion of the html page. Specifically, I'd like to only index content/words between a set of comments in the html page (but I could just as easily surround the content with another document node that could be more easily matched). Is this possible without writing a new html parser plugin? If so, how?
Thanks! Brent
