i dont understand also why they have 3 differentes signatures, since it's realy the same page !
> From: mbel...@msn.com > To: nutch-user@lucene.apache.org > Subject: dedup dont delete duplicates ! > Date: Tue, 24 Nov 2009 20:56:39 +0000 > > > > hi, > > dedup doesn't work for me. > I have read that Duplicates have either the same contents (via MD5 hash) or > the same URL > in my case i dont have the same URLS but still have the same contents for > those URLS. > i give you an exemple: i have three urls that have the same content > > 1- www.domaine/folder/ > 2- www.domaine/folder/index.html > 3- www.domaine/folder/index.html?lang=fr > > but i find all of them in my index :( > i was wondering that dedup will delete 1 and 2 > > the dedup wont work correclty !! > > _________________________________________________________________ > Windows Live: Make it easier for your friends to see what you’re up to on > Facebook. > http://go.microsoft.com/?linkid=9691816 _________________________________________________________________ Windows Live: Make it easier for your friends to see what you’re up to on Facebook. http://go.microsoft.com/?linkid=9691816