All, Is there a way to have Nutch (sorry for not being more specific in terms of the crawler, indexer, parser, etc.) ignore anchor links internal to the page (but not ignore pages internal to the site)? I have some pages being indexed, archives of mailing lists, that have a whole ton of anchors and Nutch re-fetches and re-parses the same page countless times, each time on the different anchor link. I know there is the property to ignore internal links, but I want other pages on the same host to be included, just not self-referencing links within a page.
Any help would be appreciated. Thanks. Jeff
