Re: mapreduce ItemSimilarity input optimization

Pat Ferrel Wed, 20 Aug 2014 09:29:24 -0700

> 
> On Aug 19, 2014, at 11:26 PM, Serega Sheypak <[email protected]> wrote:
> 
> Hi!
> 1. There was a bug in UI, I've checked raw recommendations. "water heating
> device" has low score. So first 30 recommended items really fits iPhone,
> next are not so good. Anyway result is good, thank you very much.
> 2. I've inspected "sessions" of users, really there are people who viewed
> iphone and heating device. 10 people for last month.
> 3. I will calculate relative measurment, I didn't calc what is % of these
> people comparing to others and how they fluence on score result.
>

That’s great. The Spark version sorts the result by weights, but I think the 
mapreduce version doesn't

> *You wrote:*
> Then once you have that working you can add more actions but only with
> cross-cooccurrence, adding by weighting* will not work with this type of
> recommender*, which recommender can work with weights for actions?

What you are doing is best practice for showing similar “views”. The technique 
for using multiple actions will be covered in a series of blogs posts and may 
be put on the Mahout site eventually. It requires spark-itemsimilarity. For now 
I’d strongly suggest you look at training on purchase data alone - see the 
comments below. 

> 
> *About building recommendations using sales.*
> Sales are less than 1% from item views. You will recommend only stuff
> people buy.

The point is not volume of data but quality of data. I once measured how 
predictive of purchases the views were and found them a rather poor predictor. 
People look at 100 things and buy 1, as you say. The question is: Do you want 
people to buy something or just browse your site?

On the other hand you would need to see how good your coverage is of purchases. 
Do you have enough items purchased by several people (Ted’s questions below 
will guide you)? If there is good coverage then you _do not_ restrict the range 
by using only purchase data. You increase the quality.

> If you recommend what people see you significantly widen range
> of possible buy actions. People always buy case "XXX" with iphone. You
> would never recommened them to buy case "YYY". If people watch "XXX" and
> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more expensive
> that is why people prefer cheaper "XXX". What's wrong with this assumption?

Nothing at all. Remember that your goal is to cause a purchase but using views 
requires some “scrubbing” of views. You want, in effect, 
views-that-lead-to-purchases. In a cooccurrence recommender this can be done 
with cross-cooccurrence and I’ll describe that elsewhere, it’s too long for an 
email to describe but pretty easy to use.

I’d wager that if you restrict to purchases your sales will go up over 
recommending views. But that is without looking at your data. If you need more 
data try increase the sliding time window to add more purchases. This will 
eventually start including things that are no longer in your catalog so will 
have diminishing returns but 60 days seem like a short time period. Filter out 
any items not in the catalog from your recommendations. 

You want recency to matter, this is good intuition. The in-catalog filter is 
one simple way, and there are others when you get to personalization.

> 
> *About our obsessive desire to add weights for actions.*
> We would like to self-tune our recommendations. If user clicks our
> recommendation it's a signal for us that items are related. So next time
> this link should have higher score. What are the approaches to do it?
> 

Yes, you do want the things that lead to purchases to go into the training 
data. This is good intuition. But you don’t do it with weights you train on new 
purchases, regardless of whether they came from random views, rec-views, or … 
You don’t care whether a rec was clicked on; you care if a purchase was made 
and you don’t care what part of the UI caused it. UI analysis is very very 
important but doesn’t help the recommender, it guides UI decisions. So 
measuring clicks is good but shouldn’t be used to change recs.

One way to increase the value of your recs is to add a little randomness to 
their ordering. If you have 10 things to recommend get 20 from itemsimilarity 
and apply a normally distributed random weighting, then re-sort and show the 
top 10. This will move some things up in order and show them where without the 
re-ordering they would never be shown. The technique allows you to expose more 
items to possible purchase and therefore affect the ordering the next time you 
train. The actual algorithm takes more space to describe but the idea is a lot 
like a multi-armed bandit where the best bandit eventually gets all trials. In 
this case the best rec leads to a purchase and gets into the new training data 
and so will be shown more often the next time.

Another thing you can do is create a “shopping cart” recommender. This looks at 
items purchased together—an item-set. It is a strong indicator of relatedness. 

Suggestions:
1) personalize: this is likely to make the most difference since you will be 
showing different things to different people. The “Practical Machine Learning” 
is short and easy to read—it describes this.
2) move to purchase data training, wait for cross-cooccurrence to add in view 
data. Do this if you have good coverage (Ted’s questions below relate to this).
3) increase the training period if needed to get good catalog coverage
4) consider dithering your recs to expose more items to purchase and therefore 
self-tune by increasing the quality of your training data.

> 
> 
> 
> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>:
> 
>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <[email protected]
>>> 
>> wrote:
>> 
>>> What could be a reason for recommending "Water heat device " to iPhone?
>>> iPhone is one of the most popular item. There should be a lot of people
>>> viewing iPhone with "Water heat device "?
>>> 
>> 
>> What are the numbers?
>> 
>> How many people got each item?  How many people total?  How many people got
>> both?
>> 
>> What about the same for the iPhone related items?
>> 
>

Re: mapreduce ItemSimilarity input optimization

Reply via email to