When we do cooccurrence recs with a search engine we index:
itemID, list-of-indicator-items
Then search on the indicator field with user item history.
Could we use a similar approach for content-based recs? Imagine a content site
where we have run the text through a pipeline that narrows input to important
tokens (lucene analyzer + LLR with threshold of some kind) Then this goes into
RowSimilarity.
Input:
docID, list-of-important-terms
output:
docID, list-of-similar-docs
Then index the list-of-similar-docs and query with the user doc history. The
idea is to personalize the content based recs rather than just show "docs like
this one"