I would strongly recommend to test the normalizer(s) before crawling. There are two handy tools, to see what you get after normalization:
echo "http://www.example/(sndjnc22e3r3r))/abc.com" \ | $NUTCH_HOME/bin/nutch org.apache.nutch.net.URLNormalizerChecker $NUTCH_HOME/bin/nutch plugin urlnormalizer-regex \ org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer <url> And yes, you can combine this with the URL filter checker: cat urls.txt \ | $NUTCH_HOME/bin/nutch org.apache.nutch.net.URLNormalizerChecker \ | $NUTCH_HOME/bin/nutch org.apache.nutch.net.URLFilterChecker -allCombined On 07/11/2013 07:59 AM, devang pandey wrote: > Hello , I am working on nutch 1.2 to crawl a site . Now few urls are like > www.example/(sndjnc22e3r3r))/abc.com. I want to strip out this part inside > brackets to normalize my urls . For this I wrote a regex in my regex > normalizer and substituted it . Now I am crawling again but still not able > to get proper results. > > Please guide me in solving this issue >

