Re: URLNormalizer not working properly

Markus Jelsma Sat, 18 Feb 2012 12:06:30 -0800

Did you update the entire crawldb with that normalizer?


> Hi,
> 
> I'm witnessing a weird problem. I configured regex-normalize.xml to escape
> whitespaces, curly braces...and it works while checking with
> URLNormalizerChecker:
> *echo "URL non escaped" | bin/nutch
> org.apache.nutch.net.URLNormalizerChecker*
> *output: escaped URL*
> 
> But when I run crawl with Nutch, I can still see "bad" URLs being fetched.
> 
> Any explanation for this?
> 
> Remi

Re: URLNormalizer not working properly

Reply via email to