Hi,

Eventhough I ran nutch dedup on my index, I still have pages with different 
urls but the exactly the same content (see search result example below). From 
what I read up on dedup this shouldn't happen though as it deletes the url with 
the lowest score. Is there anything else I can try to get rid of these?

Thanks,
Ed.

Item Document :- Client - TeraTerm Pro
... Item Document :- Client - TeraTerm Pro Intranet - Technical Standards 
Online   Employee Self Service       ESS Home ... Description Document     
Technology Category: Client Name of item: TeraTerm Pro Related policy: Unix 
Access Tool Vendor: Current Technical Status ... standard Telnet tool. Where 
printing or keymapping is an issue, TeraTerm ...
http://www.somedomain.com/im/tech/technica.nsf/8918e269a19be23f802563ef004e8e7a/441cdf92bbe06a9e80256c87003d81d9?OpenDocument
 (cached) (explain) (anchors)



Item Document :- Client - TeraTerm Pro
... Item Document :- Client - TeraTerm Pro Intranet - Technical Standards 
Online   Employee Self Service       ESS Home ... Description Document     
Technology Category: Client Name of item: TeraTerm Pro Related policy: Unix 
Access Tool Vendor: Current Technical Status ... standard Telnet tool. Where 
printing or keymapping is an issue, TeraTerm ...
http://www.somedomain.com/im/tech/technica.nsf/dacff06c3e1dbc9780257273004e1e3b/441cdf92bbe06a9e80256c87003d81d9?OpenDocument
 (cached) (explain) (anchors) 

_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/

Reply via email to