Hi, I'm trying to calc itemsimilarity using ItemSimilarityJob.
Here are my counts:
input dataset: user_id, item_id, pref: 16M
distinct items: 700K
distinct users: 4M

bucketed preferences per users
count_of_preferences, count_of_users
1                                   2M
2                                   600K
3                                   300K
4                                   300R
......

threshold: 0.91
similarityClassname=PEARSON

It returns ~2000 rows for ~1000 distinct items.
What do i do wrong?

Reply via email to