I was talking with Ken Krugler off list about the Mahout + Solr recommender and 
he had an interesting request. 

When calculating the indicator/item similarity matrix using ItemSimilarityJob 
there is a  --threshold option. Wouldn’t it be better to have an option that 
specified the fraction of values kept in the entire matrix based on their 
similarity strength? This is very difficult to do with --threshold. It would be 
like expressing the threshold as a fraction of total number of values rather 
than a strength value. Seems like this would have the effect of tossing the 
least interesting similarities where limiting per item 
(—maxSimilaritiesPerItem) could easily toss some of the most interesting.

At very least it seems like a better way of expressing the threshold, doesn’t 
it?

Reply via email to