Re: Generating a Document Similarity Matrix

Ted Dunning Tue, 15 Jun 2010 10:01:23 -0700

Threshold are generally dangerous.  It is usually preferable to specify the
sparseness you want (1%, 0.2%, whatever), sort the results in descending
score order using Hadoop's builtin capabilities and just drop the rest.


On Tue, Jun 15, 2010 at 9:32 AM, Kris Jack <[email protected]> wrote:

>  I was wondering if there was an
> interesting way to do this with the current mahout code such as requesting
> that the Vector accumulator returns only elements that have values greater
> than a given threshold, sorting the vector by value rather than key, or
> something else?
>

Re: Generating a Document Similarity Matrix

Reply via email to