Hi Jun Can you use one regex pattern to match all special situations. or maybe you can extend your own url normalizer plugin to fit your requirement.
On Wed, Apr 10, 2013 at 8:17 PM, Rajani Maski <[email protected]> wrote: > Hi, > > I think this thread should be useful: > > http://lucene.472066.n3.nabble.com/Parsed-content-in-form-of-special-characters-td4047239.html > > > > Thanks & Regards > Rajani Maski > > > > On Sun, Apr 7, 2013 at 4:56 AM, Jun Zhou <[email protected]> wrote: > > > Hi all, > > > > I'm using nutch 1.6 to crawl a web site which have lots of special > > characters in the url, like "?,=@" etc. For each character, I can add a > > regex in the regex-normalize.xml to change it into percent encoding. > > > > My question is, is there an easier way to do this? Like a url-encode > method > > to encode all the special characters rather than add regex one by one? > > > > Thanks! > > > -- Don't Grow Old, Grow Up... :-)

