I've got a cross-recommender too. It was originally conceived to do a 
multi-action ensemble from Ted's notes. I'm now gathering a new data set and 
building the meta-model learner.

Even with the same scale you need to learn the weighting factors. Consider a 
simple ensemble case:

R_p is the matrix of all recommendations from an item-base recommender using 
history for any single action (purchase?)
S_p is the matrix of all similar items for a given item as measured by user 
actions (purchases?)

It is highly likely that you can improve your ranking (also agree ratings are 
seldom useful) by incorporating both sets of data. So R = R_p + aS_p This is 
the linear combo Ted mentions. But what is "a"? The scale of the strengths here 
are very similar but their importance is not. In my case I found that S_p was 
far more predictive of a future purchase than R_p but without R_p every user 
sees the same recommendations for a given item. So some combo of the two may 
prove far stronger in practice than either alone. The combo certainly has 
produced a better precision score.

Once you learn "a" things are much simpler but to learn it you need an 
objective function like improving precision. Then you must find what weighting 
"a" gives the highest precision. This requires virtually the entire of R_p and 
S_p. Since you don't know "a" you don't know how much of the sorted R or S you 
need to combine. Using the Solr idea you would combine as many results as you 
can get from the queries trying different weightings and measuring precision 
each time. In my case I weight the entire row vector, add the two, resort, and 
calculate precision @ some number of recs.

Implementing a simple form of gradient ascent seems like a good approach 
because the full ensemble is going to have more than just "a" and therefore 
need help converging across multiple dimensions.

The version of the cross-recommender I have considers item similarities as well 
as user history based recommendations we have an ensemble something like:

R_p + aS_p + bR_v + cS_v = R 
here R_v and S_v are from the cooccurrences of views with purchases, not just 
views. 

Ted's idea below about using the rank or log rank instead of whatever the 
similarity metric returns as the "strength" seems like a really good one. It 
evens out the scales and so may converge quicker if nothing else. Pretty easy 
to implement too.


Repetition alert! Does anyone have a data set to try this on?



On May 31, 2013, at 1:15 PM, Koobas <[email protected]> wrote:

Ted,
Thank you very much. This is very insightful.
The log scaling is definitely an intuitive way of building the meta model.
Not much disagreement about the uselessness of predicting ratings.


On Fri, May 31, 2013 at 4:00 PM, Ted Dunning <[email protected]> wrote:

> In my case, I put all the indicators from all different sources in the same
> Solr/Lucene index.  Recommendations consists of making a single query to
> Solr/Lucene with as much data as I have or want to include.
> 
> At the point that this query is done, there are no weights on the
> indicators ... merely presence or absence in a field or query.  The weights
> that I typically use are computed on the fly by Lucene's default similarity
> score and the results tend to be very good.  There is no issue of combining
> scores on different scales since there is only one composite score.
> 
> If you *really* want to build multiple models using different technologies
> and combine them, you need a so-called meta-model.  There are many ways to
> build such a beast.  A very simple way is to reduce all scores to quantiles
> then to a log-odds scale (taking care not to ever estimate a quantile as
> either 0 or 1).  A linear combination of these rescaled scores can work
> pretty well although you do have to learn the linear weights.
> 
> Sometimes scores vary strongly from query to query.  In such cases,
> reducing a score to being some kind of rank statistic can be helpful. For
> instance, you may want to have a score that is the log of the rank that an
> item appears at in the results list.  You might also be able to normalize
> scores based on properties of the query. Such rank-based or normalized
> scores can then often be combined by any meta-model, including the one I
> mentioned above.
> 
> You should also look at the netflix papers, especially the one describing
> the winning entry for more ideas on model combination.  The major
> difference there is that they were trying to predict a rating which is a
> task that I find essentially useless since ranking is so much more
> important in most real-world applications.  Others may dispute my
> assessment on this, of course.
> 
> There are many ways of building the meta-model that you need, but one
> over-riding thought that I have is that the deviations from ideal in all
> real cases will be large enough that theory should not be taken too
> literally here, but rather should be used as a weak, though still useful,
> inspirational guide.
> 
> 
> On Fri, May 31, 2013 at 3:18 PM, Koobas <[email protected]> wrote:
> 
>> I am also very interested in the answer to this question.
>> Just to reiterate, if you use different recommenders, e.g.,
>> kNN user-based, kNN item-based, ALS, each one produces
>> recommendations on a different scale. So how do you combine them?
>> 
>> 
>> On Fri, May 31, 2013 at 3:07 PM, Dominik Hübner <[email protected]
>>> wrote:
>> 
>>> Hey,
>>> I have implemented a cross recommender based on the approach Ted
> Dunning
>>> proposed (cannot find the original post, but here is a follow up
>>> http://www.mail-archive.com/[email protected]/msg12983.html).
>>> Currently I am struggling with the last step of blending the initial
>>> recommendations.
>>> 
>>> My current approach:
>>> 1. Compute a cooccurrence matrix for each useful combination of
>>> user-product interaction (e.g. which product views and purchased do
>> appear
>>> in common …)
>>> 2. Perform initial recommendation based on each matrix and the required
>>> type of user vector (e.g. a user's history of views OR purchases) (like
>> the
>>> item-based recommender implemented in Mahout)
>>> 
>>> In step 2, I adapted the AggregateAndRecommendReducer of Mahout, which
>>> normalizes vectors while building the sum of weighted similarities or
> in
>>> this case => cooccurrences.
>>> 
>>> Now I end up with multiple recommendations for each product, but all of
>>> them are on a different scale.
>>> How can I convert them to have the same scale, in order to be able to
>>> weight them and build the linear combinations of initial
> recommendations
>> as
>>> Ted proposed?
>>> Would it make sense to normalize user vectors (before multiplying) as
>> well?
>>> 
>>> Otherwise views would have a much higher influence than purchases due
> to
>>> their plain characteristics (they just appear way more frequently). Or
> is
>>> this the reason for weighting purchases higher and views lower? If so,
> I
>>> think it's sort of inconvenient. Wouldn't it be much more favorable to
>> get
>>> each type of interaction within the same scale and use the weights just
>> to
>>> control each types influence on the final recommendation?
>>> 
>>> Thanks in advance for any suggestions!
>>> 
>>> 
>>> 
>>> Regards
>>> Dominik
>>> 
>>> Sent from my iPhone
>> 
> 

Reply via email to