Andrzej Bialecki wrote:
The de-duplication algorithm should be abstracted and separated into a utility method/class - currently both DeleteDuplicates and SegmentMergeTool perform de-duplication, but I'm afraid that each follows a slightly different, hardcoded routine...

Perhaps the IndexedDoc nested class from DeleteDuplicates.java could be used as a basis for this? The compareTo() method would need to be implemented, and the compare() method in each of the comparators, since I only implemented the optimized binary version.


Doug


------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to