Hello Eyeris - there is no such thing in Nutch right now. Although i do seem to remember having a plugin that provides support for it, as well as support for it via HTTP headers and og:url, of course with normalize and filter and uses robots=noindex to prevent indexing duplicates.
You can also try to improve on the patch attached to NUTCH-710. There are excellent comments for guidance. M. -----Original message----- > From:Eyeris Rodriguez Rueda <[email protected]> > Sent: Wednesday 26th October 2016 22:01 > To: [email protected] > Subject: about canonical pages to avoid duplicates pages > > Hi all. > Im using nutch 1.12 and solr 4.10.3. in local mode. > I have detected a lot of duplicates pages on crawlDB. Maybe using canonical > atribute i can reduce duplicate pages on crawldb. > I have read a old post(see below),that is an intersting topic. > https://issues.apache.org/jira/browse/NUTCH-710 > > Is this feature supported by nutch or not ?. > > >

