I had 18000 db_fetched, now only 54. Pretty dangerous command :-(

On Saturday, February 18, 2012, Markus Jelsma <[email protected]>
wrote:
> Did you update the entire crawldb with that normalizer?
>
>> Hi,
>>
>> I'm witnessing a weird problem. I configured regex-normalize.xml to
escape
>> whitespaces, curly braces...and it works while checking with
>> URLNormalizerChecker:
>> *echo "URL non escaped" | bin/nutch
>> org.apache.nutch.net.URLNormalizerChecker*
>> *output: escaped URL*
>>
>> But when I run crawl with Nutch, I can still see "bad" URLs being
fetched.
>>
>> Any explanation for this?
>>
>> Remi
>

Reply via email to