BELLINI ADAM wrote:
hi,
dedup doesn't work for me.
I have read that Duplicates have either the same contents (via MD5 hash) or
the same URL
in my case i dont have the same URLS but still have the same contents for those
URLS.
i give you an exemple: i have three urls that have the same content
1- www.domaine/folder/
2- www.domaine/folder/index.html
3- www.domaine/folder/index.html?lang=fr
but i find all of them in my index :(
i was wondering that dedup will delete 1 and 2
the dedup wont work correclty !!
Please check the value of the Signature field for all the above urls in
your crawldb.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com