The threshold should not normally be used in the Mahout+Solr deployment style.
This need is better supported by specifying the maximum number of indicators. This is mathematically equivalent to specifying a fraction of values, but is more meaningful to users since good values for this number are pretty consistent across different uses (50-100 are reasonable values for most needs larger values are quite plausible). On Tue, May 27, 2014 at 8:08 AM, Pat Ferrel <[email protected]> wrote: > I was talking with Ken Krugler off list about the Mahout + Solr > recommender and he had an interesting request. > > When calculating the indicator/item similarity matrix using > ItemSimilarityJob there is a --threshold option. Wouldn’t it be better to > have an option that specified the fraction of values kept in the entire > matrix based on their similarity strength? This is very difficult to do > with --threshold. It would be like expressing the threshold as a fraction > of total number of values rather than a strength value. Seems like this > would have the effect of tossing the least interesting similarities where > limiting per item (—maxSimilaritiesPerItem) could easily toss some of the > most interesting. > > At very least it seems like a better way of expressing the threshold, > doesn’t it?
