> > On Aug 19, 2014, at 11:26 PM, Serega Sheypak <[email protected]> wrote: > > Hi! > 1. There was a bug in UI, I've checked raw recommendations. "water heating > device" has low score. So first 30 recommended items really fits iPhone, > next are not so good. Anyway result is good, thank you very much. > 2. I've inspected "sessions" of users, really there are people who viewed > iphone and heating device. 10 people for last month. > 3. I will calculate relative measurment, I didn't calc what is % of these > people comparing to others and how they fluence on score result. >
That’s great. The Spark version sorts the result by weights, but I think the mapreduce version doesn't > *You wrote:* > Then once you have that working you can add more actions but only with > cross-cooccurrence, adding by weighting* will not work with this type of > recommender*, which recommender can work with weights for actions? What you are doing is best practice for showing similar “views”. The technique for using multiple actions will be covered in a series of blogs posts and may be put on the Mahout site eventually. It requires spark-itemsimilarity. For now I’d strongly suggest you look at training on purchase data alone - see the comments below. > > *About building recommendations using sales.* > Sales are less than 1% from item views. You will recommend only stuff > people buy. The point is not volume of data but quality of data. I once measured how predictive of purchases the views were and found them a rather poor predictor. People look at 100 things and buy 1, as you say. The question is: Do you want people to buy something or just browse your site? On the other hand you would need to see how good your coverage is of purchases. Do you have enough items purchased by several people (Ted’s questions below will guide you)? If there is good coverage then you _do not_ restrict the range by using only purchase data. You increase the quality. > If you recommend what people see you significantly widen range > of possible buy actions. People always buy case "XXX" with iphone. You > would never recommened them to buy case "YYY". If people watch "XXX" and > "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more expensive > that is why people prefer cheaper "XXX". What's wrong with this assumption? Nothing at all. Remember that your goal is to cause a purchase but using views requires some “scrubbing” of views. You want, in effect, views-that-lead-to-purchases. In a cooccurrence recommender this can be done with cross-cooccurrence and I’ll describe that elsewhere, it’s too long for an email to describe but pretty easy to use. I’d wager that if you restrict to purchases your sales will go up over recommending views. But that is without looking at your data. If you need more data try increase the sliding time window to add more purchases. This will eventually start including things that are no longer in your catalog so will have diminishing returns but 60 days seem like a short time period. Filter out any items not in the catalog from your recommendations. You want recency to matter, this is good intuition. The in-catalog filter is one simple way, and there are others when you get to personalization. > > *About our obsessive desire to add weights for actions.* > We would like to self-tune our recommendations. If user clicks our > recommendation it's a signal for us that items are related. So next time > this link should have higher score. What are the approaches to do it? > Yes, you do want the things that lead to purchases to go into the training data. This is good intuition. But you don’t do it with weights you train on new purchases, regardless of whether they came from random views, rec-views, or … You don’t care whether a rec was clicked on; you care if a purchase was made and you don’t care what part of the UI caused it. UI analysis is very very important but doesn’t help the recommender, it guides UI decisions. So measuring clicks is good but shouldn’t be used to change recs. One way to increase the value of your recs is to add a little randomness to their ordering. If you have 10 things to recommend get 20 from itemsimilarity and apply a normally distributed random weighting, then re-sort and show the top 10. This will move some things up in order and show them where without the re-ordering they would never be shown. The technique allows you to expose more items to possible purchase and therefore affect the ordering the next time you train. The actual algorithm takes more space to describe but the idea is a lot like a multi-armed bandit where the best bandit eventually gets all trials. In this case the best rec leads to a purchase and gets into the new training data and so will be shown more often the next time. Another thing you can do is create a “shopping cart” recommender. This looks at items purchased together—an item-set. It is a strong indicator of relatedness. Suggestions: 1) personalize: this is likely to make the most difference since you will be showing different things to different people. The “Practical Machine Learning” is short and easy to read—it describes this. 2) move to purchase data training, wait for cross-cooccurrence to add in view data. Do this if you have good coverage (Ted’s questions below relate to this). 3) increase the training period if needed to get good catalog coverage 4) consider dithering your recs to expose more items to purchase and therefore self-tune by increasing the quality of your training data. > > > > 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>: > >> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <[email protected] >>> >> wrote: >> >>> What could be a reason for recommending "Water heat device " to iPhone? >>> iPhone is one of the most popular item. There should be a lot of people >>> viewing iPhone with "Water heat device "? >>> >> >> What are the numbers? >> >> How many people got each item? How many people total? How many people got >> both? >> >> What about the same for the iPhone related items? >> >
