Threshold are generally dangerous. It is usually preferable to specify the sparseness you want (1%, 0.2%, whatever), sort the results in descending score order using Hadoop's builtin capabilities and just drop the rest.
On Tue, Jun 15, 2010 at 9:32 AM, Kris Jack <[email protected]> wrote: > I was wondering if there was an > interesting way to do this with the current mahout code such as requesting > that the Vector accumulator returns only elements that have values greater > than a given threshold, sorting the vector by value rather than key, or > something else? >
