Re: spark-itemsimilarity slower than itemsimilarity

Sebastian Thu, 29 Sep 2016 15:10:27 -0700

Hi Arnau,

I had a look at your ratings file and its kind of strange. Its prettytiny (770k ratings, 8MB), but it has more than 250k distinct items. Outof these, only 50k have more than 3 interactions.

So I think the first thing that you should do is throw out all the itemswith so few interactions. Item similarity computations are prettysensitive to the number of unique items, maybe thats why you don't seemuch difference in the run times.


-s


On 29.09.2016 22:17, Arnau Sanchez wrote:

 --input ratings --output spark-itemsimilarity --maxSimilaritiesPerItem 10

Re: spark-itemsimilarity slower than itemsimilarity

Reply via email to