When we do cooccurrence recs with a search engine we index:

    itemID, list-of-indicator-items

Then search on the indicator field with user item history.

Could we use a similar approach for content-based recs? Imagine a content site 
where we have run the text through a pipeline that narrows input to important 
tokens (lucene analyzer + LLR with threshold of some kind) Then this goes into 
RowSimilarity.

Input:
docID, list-of-important-terms 

output:
docID, list-of-similar-docs

Then index the list-of-similar-docs and query with the user doc history. The 
idea is to personalize the content based recs rather than just show "docs like 
this one" 

Reply via email to