Hi, I found many pages with the same title , page contents are almost same. I would like to index the pages with the same title only once.How can I recognize the pages with same title during indexing process? How do nutch remove pages with same page content and in which class/package can I find the code?
Thanks -Qi
