Hi Pat,
Thanks so much for your detailed response.

At the moment we do not have any metadata 
about the articles but just their title & body. 
In addition, in the dataset we have tweets from the user which 
will never be in the output of the recommender 
(we never want to recommend a user to see a particular tweet) 
but we will use them to tune the users' 
preferences for different pieces of news based 
on the similarity between the tweets they have 
produced and the news that we have.

Would the approach you suggest with Solr 
still be valid in this particular scenario? We would need the 
user preferences to be updated as soon as they produce 
a new tweet, therefore my urge in recompute 
item-similarities as soon as a new tweet is produced. 
We do not need to recompute the matrix of 
similarities whenever a piece of news is produced 
as you well mentioned.

I do not if the approach I am about to suggest 
even makes sense but my idea was to precompute the 
similarities between items (news + tweets) 
and stored them along with the vectorized representation 
of every item. 
Then, implement my own ItemSimilarity class 
which would return the similarity for 
every pair of items (from the matrix if available) 
or calculated on the fly if not found. My main 
problem here is that I do not know how to calculate 
in Mahout the cosine distance between the 
vectorized representation of 2 particular items. 
Does this approach make sense in the first place?

Many thanks.

Reply via email to