Yes.

The batch training data should be updated as needed but for some length of time 
the RowSimilarity Model will be valid and useful even with brand new queries 
that are made from articles not in the model. Remember however that the only 
items you will get in results are ones in the training data so that will give 
you an indication of how often to update it.

For a content based recommender you should look at Solr. The rest of the thread 
is missing but I think I also suggested that you could use it as the similarity 
engine, especially if you need immediacy of model updates. In this case you 
simply maintain an up to date Solr index on all articles, and their metadata. 
The index can be maintained in realtime or very close to it.

Once the data is in a form that Solr can index you have a very flexible content 
based recommender. For instance you can create a query from articles read, 
along with their metadata, like category, location, etc. Or you may know 
something from the User’s profile, past usage, or browsing context that allows 
you to boost results by using this metadata.

The collaborative filtering recommender that uses Solr + Mahout can seamlessly 
include metadata (content based data) to calculate recs. For instance, in the 
demo site we have Videos with genre data. When a user is looking at a Video 
which has genre tags these can be included in the query. A simple CF query 
would be a list of the Videos the user preferred against the RSJ created model. 
With Solr we can add multiple fields to the query so by adding the current 
Video’s genre tags against other Videos' genres you get genre boosted CF recs. 

You should be able to use the same technique with a purely content based 
recommender.

On Feb 15, 2014, at 1:37 PM, Juanjo Ramos <[email protected]> wrote:

Hi Pat,
Thanks for your comment, I found it quite helpful. 
I'm also trying to build a content-based 
recommender. One question though:
How can I use RunSimilarityJob for online data? 
I mean, I have a dataset and the approach you describe 
works pretty well to precompute the similarity 
matrix. 
However, when I get new content in my dataset (it is a dataset of news), 
I can I compute the similarity 
of only that new item against the rest 
without computing the whole matrix again?

Many thanks.


Reply via email to