Hi, The reducer of a huge parse takes forever! It trips over numerous URL filter exceptions, mostly stuff like:
2011-07-18 15:07:15,360 ERROR org.apache.nutch.urlfilter.domain.DomainURLFilter: Could not apply filter on url: Anlagen:AdresseAvans java.net.MalformedURLException: unknown protocol: anlagen I suspect the issue is the OutlinkExtractor, being a bit to eager. How about making it a bit more configurable? This is now a real waste of CPU-cycles. Thanks

