yes, that matters indeed! But if you don't normalize, your URL filters may not 
work although that should not be a problem in small crawls or a limited number 
of (good) websites. You could try the following normalizing rule to remove very 
long URL's as your first rule.

.{256,}

With an empty substitution this should `empty` all long URL's.
 
 
-----Original message-----
> From:eakarsu <[email protected]>
> Sent: Monday 24th June 2013 22:03
> To: [email protected]
> Subject: Re: Parse reduce stage take forver
> 
> Sebastian,
> 
> Does it matter reverse order of normalize and filter calls?
> Currently, nutch first does normalize and then filter.
> 
> What about if we do reverse: filter and then normalize? Suppose we have very
> long urls, does it kill normalize? 
> 
> Thanks
> 
> Erol Akarsu
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Parse-reduce-stage-take-forver-tp4072755p4072834.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to