How do you want to determine copy? Strictly or loosely? Solr and Nutch have some deduplication capabilities, including fuzzy matching. They probably could be brought into Mahout, too.
-Grant On Jul 7, 2010, at 10:23 AM, JAGANADH G wrote: > Dear All > > Is there any way or algo available to compare tow documents. > Eg. Check if doc "A" is a copy (palagirised version) of document "B". > > With regards > > -- > ********************************** > JAGANADH G > http://jaganadhg.freeflux.net/blog
