Hi Arnau,

I had a look at your ratings file and its kind of strange. Its pretty tiny (770k ratings, 8MB), but it has more than 250k distinct items. Out of these, only 50k have more than 3 interactions.

So I think the first thing that you should do is throw out all the items with so few interactions. Item similarity computations are pretty sensitive to the number of unique items, maybe thats why you don't see much difference in the run times.

-s


On 29.09.2016 22:17, Arnau Sanchez wrote:
 --input ratings --output spark-itemsimilarity --maxSimilaritiesPerItem 10

Reply via email to