I'm looking to only index a very small subset of the pages that I fetch - where whether or not a page belongs in that small subset is determined by the page's content when it is parsed. Anyone done anything like this / know roughly what classes I should modify? I'm flagging the documents (index / don't-index) with an extended HtmlParseFilter class, but I'm not so sure about the indexing side.
Best, John
