Yep.

By concatenating p hash-keys ( generated from p functions ) for each user,
the probability that any 2 users will agree on a concatenated hash key is
S(ui,uj)^p  and thus  making the clusters more refined.
S(ui,uj)  is the jaccard's coefficient  ( the  similarity coefficient )


On Tue, Nov 8, 2011 at 12:20 PM, Grant Ingersoll <[email protected]>wrote:

> From  MAHOUT-344 from the patch author:
>
> The idea behind keyGroups is to concatenate hashes from multiple hash
> functions reduce the probability of collision between 2 users that agreed
> on 1 or more individual hash values. This essentially improves the average
> similarity of users in a cluster.
>
> -Grant
>
> On Nov 7, 2011, at 8:54 PM, Suneel Marthi wrote:
>
> > Do we have an answer for this?
> >
> > Sent from my iPhone
> >
> > On Nov 2, 2011, at 7:20 AM, Grant Ingersoll <[email protected]> wrote:
> >
> >> What's the Minhash key groups value used for in the MinhashDriver?  I
> mean, I see it is used for building up the key out of the hashed values,
> but what's the significance of different values for it?  The default is 2,
> what does it mean practically speaking if I choose, say, 10?  AFAICT, it
> would mean that I would have more clusters, assuming that we still meet the
> minimum cluster size imposed by the reducer?
> >>
> >> Thanks,
> >> Grant
>
>
>

Reply via email to