Hi,

The reducer of a huge parse takes forever! It trips over numerous URL filter 
exceptions, mostly stuff like:

2011-07-18 15:07:15,360 ERROR 
org.apache.nutch.urlfilter.domain.DomainURLFilter: Could not apply filter on 
url: Anlagen:AdresseAvans
java.net.MalformedURLException: unknown protocol: anlagen

I suspect the issue is the OutlinkExtractor, being a bit to eager. How about 
making it a bit more configurable? This is now a real waste of CPU-cycles.

Thanks

Reply via email to