Hi Arnau,
I had a look at your ratings file and its kind of strange. Its pretty
tiny (770k ratings, 8MB), but it has more than 250k distinct items. Out
of these, only 50k have more than 3 interactions.
So I think the first thing that you should do is throw out all the items
with so few interactions. Item similarity computations are pretty
sensitive to the number of unique items, maybe thats why you don't see
much difference in the run times.
-s
On 29.09.2016 22:17, Arnau Sanchez wrote:
--input ratings --output spark-itemsimilarity --maxSimilaritiesPerItem 10