how can I index only a portion of html content?

Brent Verner Mon, 03 Jul 2006 03:19:49 -0700

Hi,

  I'd like to use nutch to index intranet/site content.  The content is all 
template-based, and I'd like to index only a portion of the html page. 
Specifically, I'd like to only index content/words between a set of comments
in the html page (but I could just as easily surround the content with 
another document node that could be more easily matched).  Is this possible 
without writing a new html parser plugin?  If so, how?


Thanks!
  Brent

how can I index only a portion of html content?

Reply via email to