On Jul 8, 2010, at 2:21 AM, JAGANADH G wrote: > On Wed, Jul 7, 2010 at 11:49 PM, Grant Ingersoll <[email protected]>wrote: > >> How do you want to determine copy? Strictly or loosely? Solr and Nutch >> have some deduplication capabilities, including fuzzy matching. They >> probably could be brought into Mahout, too. >> >> -Grant >> >> >> > Dear Grant > I am trying to make a strict match. > I will try Solar and Nutch.
So, then you can do a checksum or something like that, right? -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
