Thanks Rajani! Actually the problem is special characters in the url, not in the content. Thanks anyway!
On Wed, Apr 10, 2013 at 5:17 AM, Rajani Maski <[email protected]> wrote: > Hi, > > I think this thread should be useful: > > http://lucene.472066.n3.nabble.com/Parsed-content-in-form-of-special-characters-td4047239.html > > > > Thanks & Regards > Rajani Maski > > > > On Sun, Apr 7, 2013 at 4:56 AM, Jun Zhou <[email protected]> wrote: > > > Hi all, > > > > I'm using nutch 1.6 to crawl a web site which have lots of special > > characters in the url, like "?,=@" etc. For each character, I can add a > > regex in the regex-normalize.xml to change it into percent encoding. > > > > My question is, is there an easier way to do this? Like a url-encode > method > > to encode all the special characters rather than add regex one by one? > > > > Thanks! > > >

