I had 18000 db_fetched, now only 54. Pretty dangerous command :-(
On Saturday, February 18, 2012, Markus Jelsma <[email protected]> wrote: > Did you update the entire crawldb with that normalizer? > >> Hi, >> >> I'm witnessing a weird problem. I configured regex-normalize.xml to escape >> whitespaces, curly braces...and it works while checking with >> URLNormalizerChecker: >> *echo "URL non escaped" | bin/nutch >> org.apache.nutch.net.URLNormalizerChecker* >> *output: escaped URL* >> >> But when I run crawl with Nutch, I can still see "bad" URLs being fetched. >> >> Any explanation for this? >> >> Remi >

