Hi,
I found many pages with the same title , page contents are almost same. I would 
like to index the pages with the same title only once.How can I recognize the 
pages with same title during indexing process?
How do nutch remove pages with same page content and in which class/package can 
I find the code? 

Thanks
-Qi

Reply via email to