Hi, I'm trying to calc itemsimilarity using ItemSimilarityJob. Here are my counts: input dataset: user_id, item_id, pref: 16M distinct items: 700K distinct users: 4M
bucketed preferences per users count_of_preferences, count_of_users 1 2M 2 600K 3 300K 4 300R ...... threshold: 0.91 similarityClassname=PEARSON It returns ~2000 rows for ~1000 distinct items. What do i do wrong?
