Thanks Rajani!

Actually the problem is special characters in the url, not in the content.
Thanks anyway!


On Wed, Apr 10, 2013 at 5:17 AM, Rajani Maski <[email protected]> wrote:

> Hi,
>
>  I think this thread should be useful:
>
> http://lucene.472066.n3.nabble.com/Parsed-content-in-form-of-special-characters-td4047239.html
>
>
>
> Thanks & Regards
> Rajani Maski
>
>
>
> On Sun, Apr 7, 2013 at 4:56 AM, Jun Zhou <[email protected]> wrote:
>
> > Hi all,
> >
> > I'm using nutch 1.6 to crawl a web site which have lots of special
> > characters in the url, like "?,=@" etc.  For each character, I can add a
> > regex in the regex-normalize.xml to change it into percent encoding.
> >
> > My question is, is there an easier way to do this? Like a url-encode
> method
> > to encode all the special characters rather than add regex one by one?
> >
> > Thanks!
> >
>

Reply via email to