I'm looking to only index a very small subset of the pages that I fetch -
where whether or not a page belongs in that small subset is determined by
the page's content when it is parsed.  Anyone done anything like this / know
roughly what classes I should modify?  I'm flagging the documents (index /
don't-index) with an extended HtmlParseFilter class, but I'm not so sure
about the indexing side.

Best,
John

Reply via email to