Ok, I got it. Is it Ted's book? http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=la_B00EHXC1NK_1_1?s=books&ie=UTF8&qid=1408689021&sr=1-1
I've read this one: http://www.amazon.com/Apache-Mahout-Cookbook-Piero-Giacomelli-ebook/dp/B00HJR6R86/ref=sr_1_2?s=books&ie=UTF8&qid=1408689063&sr=1-2&keywords=mahout No satisfaction at all 2014-08-21 20:32 GMT+04:00 Pat Ferrel <[email protected]>: > Sorry that wasn’t clear. > > Given your purchase volume, you may not have very good coverage training > on purchases only. So using views may be your best bet. Ted’s metrics were: > "How many people got each item? How many people total? How many people > got both?” This is how you tell what action has enough data to be useful. > In your case that my be views. > > The other point was about doing personalization. Since you have > itemsimilarity working well you can add personalization with a search > engine using methods described in Ted’s book. This requires that you > capture user history (views in this case) and use that as a query on the > itemsimilarity data. If you know enough of the current user’s recent > history this will allow you to show “people with the same taste as you also > looked at these items”. > > Currently you are not personalizing, you are showing the same “similar > items” to every user. That is fine but personalization may improve things > further. > > > On Aug 21, 2014, at 8:08 AM, Serega Sheypak <[email protected]> > wrote: > > Excuse me looks like I've missed important point > "Ah, then using Ted’s metrics views _is_ probably your best bet." > You are talking about "personal recommendations" serving from search > engine? The idea was to get active vitior view history and give him > "similar" view histories from search engine in runtime? > > > 2014-08-21 18:50 GMT+04:00 Pat Ferrel <[email protected]>: > > >> > >> On Aug 21, 2014, at 1:22 AM, Serega Sheypak <[email protected]> > > wrote: > >> > >>>> What you are doing is best practice for showing similar “views”. The > >> technique for using multiple actions will be covered in a series of > blogs > >> posts and may be put on the Mahout site eventually > >> Great thanks! > >> > >>>> People look at 100 things and buy 1, as you say. The question is: Do > > you > >> want people to buy something or just browse your site? > >> No objections for your point. I understand it. It should work for pretty > >> big ecom, right? Small ecom sell 100-200 items per day and have wide > > range > >> of items. > > > > Ah, then using Ted’s metrics views _is_ probably your best bet. You can > > probably still personalize view recommendations. Since you are already > > using itemsimilarity it can be a second step that builds on the first. > > > >> > >>>> Filter out any items not in the catalog from your recommendations. > >> We have it on data preparation stage. We recalculate item similarity > each > >> day sliding back for 60 days excluding non-available items on > preparation > >> stage. > >> > >> Thank you! We did reach good results, business guys got satisfaction :) > >> > >> > >> 2014-08-20 20:28 GMT+04:00 Pat Ferrel <[email protected]>: > >> > >>>> > >>>> On Aug 19, 2014, at 11:26 PM, Serega Sheypak < > [email protected] > >> > >>> wrote: > >>>> > >>>> Hi! > >>>> 1. There was a bug in UI, I've checked raw recommendations. "water > >>> heating > >>>> device" has low score. So first 30 recommended items really fits > > iPhone, > >>>> next are not so good. Anyway result is good, thank you very much. > >>>> 2. I've inspected "sessions" of users, really there are people who > > viewed > >>>> iphone and heating device. 10 people for last month. > >>>> 3. I will calculate relative measurment, I didn't calc what is % of > > these > >>>> people comparing to others and how they fluence on score result. > >>>> > >>> > >>> That’s great. The Spark version sorts the result by weights, but I > think > >>> the mapreduce version doesn't > >>> > >>>> *You wrote:* > >>>> Then once you have that working you can add more actions but only with > >>>> cross-cooccurrence, adding by weighting* will not work with this type > > of > >>>> recommender*, which recommender can work with weights for actions? > >>> > >>> What you are doing is best practice for showing similar “views”. The > >>> technique for using multiple actions will be covered in a series of > > blogs > >>> posts and may be put on the Mahout site eventually. It requires > >>> spark-itemsimilarity. For now I’d strongly suggest you look at training > > on > >>> purchase data alone - see the comments below. > >>> > >>>> > >>>> *About building recommendations using sales.* > >>>> Sales are less than 1% from item views. You will recommend only stuff > >>>> people buy. > >>> > >>> The point is not volume of data but quality of data. I once measured > how > >>> predictive of purchases the views were and found them a rather poor > >>> predictor. People look at 100 things and buy 1, as you say. The > question > >>> is: Do you want people to buy something or just browse your site? > >>> > >>> On the other hand you would need to see how good your coverage is of > >>> purchases. Do you have enough items purchased by several people (Ted’s > >>> questions below will guide you)? If there is good coverage then you _do > >>> not_ restrict the range by using only purchase data. You increase the > >>> quality. > >>> > >>>> If you recommend what people see you significantly widen range > >>>> of possible buy actions. People always buy case "XXX" with iphone. You > >>>> would never recommened them to buy case "YYY". If people watch "XXX" > > and > >>>> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more > >>> expensive > >>>> that is why people prefer cheaper "XXX". What's wrong with this > >>> assumption? > >>> > >>> Nothing at all. Remember that your goal is to cause a purchase but > using > >>> views requires some “scrubbing” of views. You want, in effect, > >>> views-that-lead-to-purchases. In a cooccurrence recommender this can be > >>> done with cross-cooccurrence and I’ll describe that elsewhere, it’s too > >>> long for an email to describe but pretty easy to use. > >>> > >>> I’d wager that if you restrict to purchases your sales will go up over > >>> recommending views. But that is without looking at your data. If you > > need > >>> more data try increase the sliding time window to add more purchases. > > This > >>> will eventually start including things that are no longer in your > > catalog > >>> so will have diminishing returns but 60 days seem like a short time > > period. > >>> Filter out any items not in the catalog from your recommendations. > >>> > >>> You want recency to matter, this is good intuition. The in-catalog > > filter > >>> is one simple way, and there are others when you get to > personalization. > >>> > >>>> > >>>> *About our obsessive desire to add weights for actions.* > >>>> We would like to self-tune our recommendations. If user clicks our > >>>> recommendation it's a signal for us that items are related. So next > > time > >>>> this link should have higher score. What are the approaches to do it? > >>>> > >>> > >>> Yes, you do want the things that lead to purchases to go into the > > training > >>> data. This is good intuition. But you don’t do it with weights you > > train on > >>> new purchases, regardless of whether they came from random views, > >>> rec-views, or … You don’t care whether a rec was clicked on; you care > > if a > >>> purchase was made and you don’t care what part of the UI caused it. UI > >>> analysis is very very important but doesn’t help the recommender, it > > guides > >>> UI decisions. So measuring clicks is good but shouldn’t be used to > > change > >>> recs. > >>> > >>> One way to increase the value of your recs is to add a little > randomness > >>> to their ordering. If you have 10 things to recommend get 20 from > >>> itemsimilarity and apply a normally distributed random weighting, then > >>> re-sort and show the top 10. This will move some things up in order and > >>> show them where without the re-ordering they would never be shown. The > >>> technique allows you to expose more items to possible purchase and > >>> therefore affect the ordering the next time you train. The actual > > algorithm > >>> takes more space to describe but the idea is a lot like a multi-armed > >>> bandit where the best bandit eventually gets all trials. In this case > > the > >>> best rec leads to a purchase and gets into the new training data and so > >>> will be shown more often the next time. > >>> > >>> Another thing you can do is create a “shopping cart” recommender. This > >>> looks at items purchased together—an item-set. It is a strong indicator > > of > >>> relatedness. > >>> > >>> Suggestions: > >>> 1) personalize: this is likely to make the most difference since you > > will > >>> be showing different things to different people. The “Practical Machine > >>> Learning” is short and easy to read—it describes this. > >>> 2) move to purchase data training, wait for cross-cooccurrence to add > in > >>> view data. Do this if you have good coverage (Ted’s questions below > > relate > >>> to this). > >>> 3) increase the training period if needed to get good catalog coverage > >>> 4) consider dithering your recs to expose more items to purchase and > >>> therefore self-tune by increasing the quality of your training data. > >>> > >>>> > >>>> > >>>> > >>>> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>: > >>>> > >>>>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak < > >>> [email protected] > >>>>>> > >>>>> wrote: > >>>>> > >>>>>> What could be a reason for recommending "Water heat device " to > > iPhone? > >>>>>> iPhone is one of the most popular item. There should be a lot of > > people > >>>>>> viewing iPhone with "Water heat device "? > >>>>>> > >>>>> > >>>>> What are the numbers? > >>>>> > >>>>> How many people got each item? How many people total? How many > > people > >>> got > >>>>> both? > >>>>> > >>>>> What about the same for the iPhone related items? > >>>>> > >>>> > >>> > >> > > > >
