Regarding ItemSimilarityJob, it is my understanding that if there are two
input lines of the form <user1, product1> and <user1, product2>,
then that would constitute a co-occurrence between product1 and product2.

I've generated a large test dataset under this assumption, and it guarantees
that there will only be co-occurrences between pairs of product IDs that
I've predefined. I'm not using preference values and I'm setting
--booleanData true.

While the ItemSimilarityJob's output does include these predefined
co-occurrences, it also outputs a large number of co-occurrences (with small
co-occurrence counts) between products that are not co-occurring in the
input dataset. Can anyone provide some insight as to why this might be
happening?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/ItemSimilarityJob-Cooccurrence-Question-tp3024516p3024516.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to