Hi, Use a regex url filter to filter those URL's and prevent them from being crawled again.
Cheers -----Original message----- > From:devang pandey <[email protected]> > Sent: Wednesday 10th July 2013 10:29 > To: [email protected] > Subject: nutch crawling issues > > I have a website eg . www.example.com. Now when I am crawling this using > nutch 1.4 problem is that of duplicated crawling . There are a number of > pages like www.example.com/s38r84rejkfndn/xyz.aspx . Now this number > s38r84rejkfndn keeps on changing every time you visit this page and hence > crawler is crawling this again and again as for nutch I this this must be a > new url everytime . Please suggest me how to overcome this issue >

