Are you using 1.4? It keeps a previous version of the DB in crawldb/old/ by 
default.

> I had 18000 db_fetched, now only 54. Pretty dangerous command :-(
> 
> On Saturday, February 18, 2012, Markus Jelsma <[email protected]>
> 
> wrote:
> > Did you update the entire crawldb with that normalizer?
> > 
> >> Hi,
> >> 
> >> I'm witnessing a weird problem. I configured regex-normalize.xml to
> 
> escape
> 
> >> whitespaces, curly braces...and it works while checking with
> >> URLNormalizerChecker:
> >> *echo "URL non escaped" | bin/nutch
> >> org.apache.nutch.net.URLNormalizerChecker*
> >> *output: escaped URL*
> >> 
> >> But when I run crawl with Nutch, I can still see "bad" URLs being
> 
> fetched.
> 
> >> Any explanation for this?
> >> 
> >> Remi

Reply via email to