[ https://issues.apache.org/jira/browse/NUTCH-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494537 ]
Sami Siren commented on NUTCH-476: ---------------------------------- md5 sum (or any other configurable "digest") is already calculated in fetcher or parser and dedup can be used to remove duplicates. > Would like to add a field to the document class for its MD5 signature > ---------------------------------------------------------------------- > > Key: NUTCH-476 > URL: https://issues.apache.org/jira/browse/NUTCH-476 > Project: Nutch > Issue Type: Improvement > Components: indexer > Environment: all > Reporter: Linh Pham > Priority: Minor > > During indexing a file, if an MD5 signature was calculated and stored along > with the document as a default, > it could then be used to remove duplicates from the results on retrieval. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers