Hello,
  I am running nutch .8 against hadoop .4, just for reference
I want to add a delete duplicate based on a similarity algorithm, as opposed
to the hash method that is currently in there.
I would have to say I am pretty lost as to how the delete duplicates class
is working.
I would guess that I need to implement a compareTo method, but I am not
really sure what to return. Also, when I do return something, where do I
implement the functionality to say "yes, these are dupes, so remove the
first one)

Can anyone help out?
Thanks,
S
-- 
View this message in context: 
http://www.nabble.com/Need-help-with-deleteduplicates-tf2858127.html#a7985094
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to