Are you using 1.4? It keeps a previous version of the DB in crawldb/old/ by default.
> I had 18000 db_fetched, now only 54. Pretty dangerous command :-( > > On Saturday, February 18, 2012, Markus Jelsma <[email protected]> > > wrote: > > Did you update the entire crawldb with that normalizer? > > > >> Hi, > >> > >> I'm witnessing a weird problem. I configured regex-normalize.xml to > > escape > > >> whitespaces, curly braces...and it works while checking with > >> URLNormalizerChecker: > >> *echo "URL non escaped" | bin/nutch > >> org.apache.nutch.net.URLNormalizerChecker* > >> *output: escaped URL* > >> > >> But when I run crawl with Nutch, I can still see "bad" URLs being > > fetched. > > >> Any explanation for this? > >> > >> Remi

