Agreed. The bag-of-words case is only the most obvious and you calculate indicators is a slightly different way. In retrospect it is probably better to filter terms with an analyzer then LLR and use cosine on the TF-IDF weights for doc-doc similarity indicators.
You can blend in popularity or “hotness” or any other feature too. These can be used directly as indicators or combined. For instance the demo app uses genres as tags and the collection of genres are attached to the items as indicators. The query can boost the genre indicator while using CF indicators too. This would to skew recs towards certain genres. When showing presonalized recs on an item page the app does this kind of skewing all in one query. A query could be constructed to, in effect, fallback from say CF to content or metadata gracefully depending on the available data for user or item. So if you have a new item it can be recommended and an anonymous user will get recs even if the recs are only minimally personalized (demographic, or location for instance). It embodies a recommender that gracefully handles the cold-start problem—all the while making warmed-up data rich recs even better. Putting all this together sure seems like a conceptual breakthrough. How many times have you seen these cases solved ad-hoc? How many recommenders handle them? BTW, not sure I follow when two queries are needed. If you are talking about cross-action indicators where you are measuring say all users’ history of preferring a tag and crossing that with the primary action—that can be pre-digested into an indicator and attached to items before the query, right? On Sep 14, 2014, at 10:58 PM, Ted Dunning <[email protected]> wrote: Yes. This can work, particularly for content tags (like demographics) on users. For content tags on documents, you typically have an extra retrieval. First you recommend tags, then you search for the top recommended tags. Tags will have indicators in this sort of case, just like items would normally. Tags are, of course, not tags in the normal sense, but any content based feature of an item or possibly even combinations of content-based features. On Thu, Aug 28, 2014 at 11:16 AM, Pat Ferrel <[email protected]> wrote: > When we do cooccurrence recs with a search engine we index: > > itemID, list-of-indicator-items > > Then search on the indicator field with user item history. > > Could we use a similar approach for content-based recs? Imagine a content > site where we have run the text through a pipeline that narrows input to > important tokens (lucene analyzer + LLR with threshold of some kind) Then > this goes into RowSimilarity. > > Input: > docID, list-of-important-terms > > output: > docID, list-of-similar-docs > > Then index the list-of-similar-docs and query with the user doc history. > The idea is to personalize the content based recs rather than just show > "docs like this one"
