Re: mapreduce ItemSimilarity input optimization

Pat Ferrel Thu, 21 Aug 2014 07:51:07 -0700

> 
> On Aug 21, 2014, at 1:22 AM, Serega Sheypak <[email protected]> wrote:
> 
>>> What you are doing is best practice for showing similar “views”. The
> technique for using multiple actions will be covered in a series of blogs
> posts and may be put on the Mahout site eventually
> Great thanks!
> 
>>> People look at 100 things and buy 1, as you say. The question is: Do you
> want people to buy something or just browse your site?
> No objections for your point. I understand it. It should work for pretty
> big ecom, right? Small ecom sell 100-200 items per day and have wide range
> of items.


Ah, then using Ted’s metrics views _is_ probably your best bet. You can 
probably still personalize view recommendations. Since you are already using 
itemsimilarity it can be a second step that builds on the first. 

> 
>>> Filter out any items not in the catalog from your recommendations.
> We have it on data preparation stage. We recalculate item similarity each
> day sliding back for 60 days excluding non-available items on preparation
> stage.
> 
> Thank you! We did reach good results, business guys got satisfaction :)
> 
> 
> 2014-08-20 20:28 GMT+04:00 Pat Ferrel <[email protected]>:
> 
>>> 
>>> On Aug 19, 2014, at 11:26 PM, Serega Sheypak <[email protected]>
>> wrote:
>>> 
>>> Hi!
>>> 1. There was a bug in UI, I've checked raw recommendations. "water
>> heating
>>> device" has low score. So first 30 recommended items really fits iPhone,
>>> next are not so good. Anyway result is good, thank you very much.
>>> 2. I've inspected "sessions" of users, really there are people who viewed
>>> iphone and heating device. 10 people for last month.
>>> 3. I will calculate relative measurment, I didn't calc what is % of these
>>> people comparing to others and how they fluence on score result.
>>> 
>> 
>> That’s great. The Spark version sorts the result by weights, but I think
>> the mapreduce version doesn't
>> 
>>> *You wrote:*
>>> Then once you have that working you can add more actions but only with
>>> cross-cooccurrence, adding by weighting* will not work with this type of
>>> recommender*, which recommender can work with weights for actions?
>> 
>> What you are doing is best practice for showing similar “views”. The
>> technique for using multiple actions will be covered in a series of blogs
>> posts and may be put on the Mahout site eventually. It requires
>> spark-itemsimilarity. For now I’d strongly suggest you look at training on
>> purchase data alone - see the comments below.
>> 
>>> 
>>> *About building recommendations using sales.*
>>> Sales are less than 1% from item views. You will recommend only stuff
>>> people buy.
>> 
>> The point is not volume of data but quality of data. I once measured how
>> predictive of purchases the views were and found them a rather poor
>> predictor. People look at 100 things and buy 1, as you say. The question
>> is: Do you want people to buy something or just browse your site?
>> 
>> On the other hand you would need to see how good your coverage is of
>> purchases. Do you have enough items purchased by several people (Ted’s
>> questions below will guide you)? If there is good coverage then you _do
>> not_ restrict the range by using only purchase data. You increase the
>> quality.
>> 
>>> If you recommend what people see you significantly widen range
>>> of possible buy actions. People always buy case "XXX" with iphone. You
>>> would never recommened them to buy case "YYY". If people watch "XXX" and
>>> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more
>> expensive
>>> that is why people prefer cheaper "XXX". What's wrong with this
>> assumption?
>> 
>> Nothing at all. Remember that your goal is to cause a purchase but using
>> views requires some “scrubbing” of views. You want, in effect,
>> views-that-lead-to-purchases. In a cooccurrence recommender this can be
>> done with cross-cooccurrence and I’ll describe that elsewhere, it’s too
>> long for an email to describe but pretty easy to use.
>> 
>> I’d wager that if you restrict to purchases your sales will go up over
>> recommending views. But that is without looking at your data. If you need
>> more data try increase the sliding time window to add more purchases. This
>> will eventually start including things that are no longer in your catalog
>> so will have diminishing returns but 60 days seem like a short time period.
>> Filter out any items not in the catalog from your recommendations.
>> 
>> You want recency to matter, this is good intuition. The in-catalog filter
>> is one simple way, and there are others when you get to personalization.
>> 
>>> 
>>> *About our obsessive desire to add weights for actions.*
>>> We would like to self-tune our recommendations. If user clicks our
>>> recommendation it's a signal for us that items are related. So next time
>>> this link should have higher score. What are the approaches to do it?
>>> 
>> 
>> Yes, you do want the things that lead to purchases to go into the training
>> data. This is good intuition. But you don’t do it with weights you train on
>> new purchases, regardless of whether they came from random views,
>> rec-views, or … You don’t care whether a rec was clicked on; you care if a
>> purchase was made and you don’t care what part of the UI caused it. UI
>> analysis is very very important but doesn’t help the recommender, it guides
>> UI decisions. So measuring clicks is good but shouldn’t be used to change
>> recs.
>> 
>> One way to increase the value of your recs is to add a little randomness
>> to their ordering. If you have 10 things to recommend get 20 from
>> itemsimilarity and apply a normally distributed random weighting, then
>> re-sort and show the top 10. This will move some things up in order and
>> show them where without the re-ordering they would never be shown. The
>> technique allows you to expose more items to possible purchase and
>> therefore affect the ordering the next time you train. The actual algorithm
>> takes more space to describe but the idea is a lot like a multi-armed
>> bandit where the best bandit eventually gets all trials. In this case the
>> best rec leads to a purchase and gets into the new training data and so
>> will be shown more often the next time.
>> 
>> Another thing you can do is create a “shopping cart” recommender. This
>> looks at items purchased together—an item-set. It is a strong indicator of
>> relatedness.
>> 
>> Suggestions:
>> 1) personalize: this is likely to make the most difference since you will
>> be showing different things to different people. The “Practical Machine
>> Learning” is short and easy to read—it describes this.
>> 2) move to purchase data training, wait for cross-cooccurrence to add in
>> view data. Do this if you have good coverage (Ted’s questions below relate
>> to this).
>> 3) increase the training period if needed to get good catalog coverage
>> 4) consider dithering your recs to expose more items to purchase and
>> therefore self-tune by increasing the quality of your training data.
>> 
>>> 
>>> 
>>> 
>>> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>:
>>> 
>>>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <
>> [email protected]
>>>>> 
>>>> wrote:
>>>> 
>>>>> What could be a reason for recommending "Water heat device " to iPhone?
>>>>> iPhone is one of the most popular item. There should be a lot of people
>>>>> viewing iPhone with "Water heat device "?
>>>>> 
>>>> 
>>>> What are the numbers?
>>>> 
>>>> How many people got each item?  How many people total?  How many people
>> got
>>>> both?
>>>> 
>>>> What about the same for the iPhone related items?
>>>> 
>>> 
>> 
>

Re: mapreduce ItemSimilarity input optimization

Reply via email to