Agreed. The bag-of-words case is only the most obvious and you calculate 
indicators is a slightly different way. In retrospect it is probably better to 
filter terms with an analyzer then LLR and use cosine on the TF-IDF weights for 
doc-doc similarity indicators.

You can blend in popularity or “hotness” or any other feature too. These can be 
used directly as indicators or combined. For instance the demo app uses genres 
as tags and the collection of genres are attached to the items as indicators. 
The query can boost the genre indicator while using CF indicators too. This 
would to skew recs towards certain genres. When showing presonalized recs on an 
item page the app does this kind of skewing all in one query.

A query could be constructed to, in effect, fallback from say CF to content or 
metadata gracefully depending on the available data for user or item. So if you 
have a new item it can be recommended and an anonymous user will get recs even 
if the recs are only minimally personalized (demographic, or location for 
instance). It embodies a recommender that gracefully handles the cold-start 
problem—all the while making warmed-up data rich recs even better. 

Putting all this together sure seems like a conceptual breakthrough. How many 
times have you seen these cases solved ad-hoc? How many recommenders handle 
them?

BTW, not sure I follow when two queries are needed. If you are talking about 
cross-action indicators where you are measuring say all users’ history of 
preferring a tag and crossing that with the primary action—that can be 
pre-digested into an indicator and attached to items before the query, right?


On Sep 14, 2014, at 10:58 PM, Ted Dunning <[email protected]> wrote:

Yes.  This can work, particularly for content tags (like demographics) on
users.  For content tags on documents, you typically have an extra
retrieval.  First you recommend tags, then you search for the top
recommended tags.  Tags will have indicators in this sort of case, just
like items would normally.  Tags are, of course, not tags in the normal
sense, but any content based feature of an item or possibly even
combinations of content-based features.




On Thu, Aug 28, 2014 at 11:16 AM, Pat Ferrel <[email protected]> wrote:

> When we do cooccurrence recs with a search engine we index:
> 
>    itemID, list-of-indicator-items
> 
> Then search on the indicator field with user item history.
> 
> Could we use a similar approach for content-based recs? Imagine a content
> site where we have run the text through a pipeline that narrows input to
> important tokens (lucene analyzer + LLR with threshold of some kind) Then
> this goes into RowSimilarity.
> 
> Input:
> docID, list-of-important-terms
> 
> output:
> docID, list-of-similar-docs
> 
> Then index the list-of-similar-docs and query with the user doc history.
> The idea is to personalize the content based recs rather than just show
> "docs like this one"

Reply via email to