On Tue, Apr 26, 2011 at 9:12 AM, Sean Owen <[email protected]> wrote:

> That reduces to something like the Jaccard / Tanimoto coefficient -- not
> precisely since you're dividing by the length of those vectors rather than
> the size of their "union", but practically similar. And that's implemented
> as TanimotoCoefficientSimilarity.
>

That is similar, but the normalizer is different.  The length of {1,1,0} =
sqrt(2), not 2.

It will have all the small count problems that Jaccard and Tanimoto have.


> Perhaps my point is that in Mahout (well the recommender end of the world),
> binary data is not {0,1} data but {null,1} data.


Sure.  But the number of 1's and the number of overlapping 1's is all that
is needed to do the computation.  Since we are adding, the number of nulls
doesn't much matter.

... I had thought the question
> was "how do you do this in Mahout".
>

You answered that with your Tanimoto comment (but with an added code mod).

Reply via email to