If it starts to redirect and you are on the wrong side of the redirect, you're 
in trouble. But with the HostNormalizer you can then renormalize all URL's to 
the host that is being redirected to.
 
 
-----Original message-----
> From:Alexei Korolev <alexei.koro...@gmail.com>
> Sent: Wed 08-Aug-2012 15:55
> To: user@nutch.apache.org
> Subject: Re: crawling site without www
> 
> > You can use the HostURLNormalizer for this task or just crawl the www OR
> > the non-www, not both.
> >
> 
> I'm trying to crawl only version without www. As I see, I can remove www.
> using proper configured regex-normalize.xml.
> But will it work if mobile365.ru redirect on www.mobile365.ru (it's very
> common situation in web)
> 
> Thanks.
> 
> Alexei
> 

Reply via email to