i dont understand also why they have 3 differentes signatures, since  it's 
realy the same page !


> From: mbel...@msn.com
> To: nutch-user@lucene.apache.org
> Subject: dedup dont delete duplicates !
> Date: Tue, 24 Nov 2009 20:56:39 +0000
> 
> 
> 
> hi,
> 
> dedup doesn't work for me.
> I have read that  Duplicates have either the same contents (via MD5 hash) or 
> the same URL
> in my case i dont have the same URLS but still have the same contents for 
> those URLS.
> i give you an exemple:  i have three urls that have the same content
> 
> 1- www.domaine/folder/
> 2- www.domaine/folder/index.html
> 3- www.domaine/folder/index.html?lang=fr
> 
> but i find all of them in my index :(
> i was wondering that dedup will delete 1 and 2 
> 
> the dedup wont work correclty !!
>                                         
> _________________________________________________________________
> Windows Live: Make it easier for your friends to see what you’re up to on 
> Facebook.
> http://go.microsoft.com/?linkid=9691816
                                          
_________________________________________________________________
Windows Live: Make it easier for your friends to see what you’re up to on 
Facebook.
http://go.microsoft.com/?linkid=9691816

Reply via email to