From MAHOUT-344 from the patch author: The idea behind keyGroups is to concatenate hashes from multiple hash functions reduce the probability of collision between 2 users that agreed on 1 or more individual hash values. This essentially improves the average similarity of users in a cluster.
-Grant On Nov 7, 2011, at 8:54 PM, Suneel Marthi wrote: > Do we have an answer for this? > > Sent from my iPhone > > On Nov 2, 2011, at 7:20 AM, Grant Ingersoll <[email protected]> wrote: > >> What's the Minhash key groups value used for in the MinhashDriver? I mean, >> I see it is used for building up the key out of the hashed values, but >> what's the significance of different values for it? The default is 2, what >> does it mean practically speaking if I choose, say, 10? AFAICT, it would >> mean that I would have more clusters, assuming that we still meet the >> minimum cluster size imposed by the reducer? >> >> Thanks, >> Grant
