Only indexing pages meeting certain criteria

John Thompson Fri, 27 Jun 2008 17:41:43 -0700

I'm looking to only index a very small subset of the pages that I fetch -
where whether or not a page belongs in that small subset is determined by
the page's content when it is parsed.  Anyone done anything like this / know
roughly what classes I should modify?  I'm flagging the documents (index /
don't-index) with an extended HtmlParseFilter class, but I'm not so sure
about the indexing side.


Best,
John

Only indexing pages meeting certain criteria

Reply via email to