Stefan Groschupf wrote:
I copy a working index and merge the original and the old together. Than I run the dedub over these index. Shouldn't the dedub tool remove the duplicates in the merged index?

I usually dedup before index merge, so that the merged index contains no duplicates. The mapred dedup tool should work after merging too, though, although it expects a directory of indexes, not a single index. Note again that it does not yet dedup by url, only but md5 of content.

Doug



-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to