The threshold should not normally be used in the Mahout+Solr deployment
style.

This need is better supported by specifying the maximum number of
indicators.  This is mathematically equivalent to specifying a fraction of
values, but is more meaningful to users since good values for this number
are pretty consistent across different uses (50-100 are reasonable values
for most needs larger values are quite plausible).




On Tue, May 27, 2014 at 8:08 AM, Pat Ferrel <[email protected]> wrote:

> I was talking with Ken Krugler off list about the Mahout + Solr
> recommender and he had an interesting request.
>
> When calculating the indicator/item similarity matrix using
> ItemSimilarityJob there is a  --threshold option. Wouldn’t it be better to
> have an option that specified the fraction of values kept in the entire
> matrix based on their similarity strength? This is very difficult to do
> with --threshold. It would be like expressing the threshold as a fraction
> of total number of values rather than a strength value. Seems like this
> would have the effect of tossing the least interesting similarities where
> limiting per item (—maxSimilaritiesPerItem) could easily toss some of the
> most interesting.
>
> At very least it seems like a better way of expressing the threshold,
> doesn’t it?

Reply via email to