Re: mapreduce ItemSimilarity input optimization

Serega Sheypak Thu, 21 Aug 2014 23:32:56 -0700

Ok, I got it. Is it Ted's book?
http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=la_B00EHXC1NK_1_1?s=books&ie=UTF8&qid=1408689021&sr=1-1


I've read this one:
http://www.amazon.com/Apache-Mahout-Cookbook-Piero-Giacomelli-ebook/dp/B00HJR6R86/ref=sr_1_2?s=books&ie=UTF8&qid=1408689063&sr=1-2&keywords=mahout

No satisfaction at all




2014-08-21 20:32 GMT+04:00 Pat Ferrel <[email protected]>:

> Sorry that wasn’t clear.
>
> Given your purchase volume, you may not have very good coverage training
> on purchases only. So using views may be your best bet. Ted’s metrics were:
> "How many people got each item?  How many people total?  How many people
> got both?” This is how you tell what action has enough data to be useful.
> In your case that my be views.
>
> The other point was about doing personalization. Since you have
> itemsimilarity working well you can add personalization with a search
> engine using methods described in Ted’s book. This requires that you
> capture user history (views in this case) and use that as a query on the
> itemsimilarity data. If you know enough of the current user’s recent
> history this will allow you to show “people with the same taste as you also
> looked at these items”.
>
> Currently you are not personalizing, you are showing the same “similar
> items” to every user. That is fine but personalization may improve things
> further.
>
>
> On Aug 21, 2014, at 8:08 AM, Serega Sheypak <[email protected]>
> wrote:
>
> Excuse me looks like I've missed important point
> "Ah, then using Ted’s metrics views _is_ probably your best bet."
> You are talking about "personal recommendations" serving from search
> engine? The idea was to get active vitior view history and give him
> "similar" view histories from search engine in runtime?
>
>
> 2014-08-21 18:50 GMT+04:00 Pat Ferrel <[email protected]>:
>
> >>
> >> On Aug 21, 2014, at 1:22 AM, Serega Sheypak <[email protected]>
> > wrote:
> >>
> >>>> What you are doing is best practice for showing similar “views”. The
> >> technique for using multiple actions will be covered in a series of
> blogs
> >> posts and may be put on the Mahout site eventually
> >> Great thanks!
> >>
> >>>> People look at 100 things and buy 1, as you say. The question is: Do
> > you
> >> want people to buy something or just browse your site?
> >> No objections for your point. I understand it. It should work for pretty
> >> big ecom, right? Small ecom sell 100-200 items per day and have wide
> > range
> >> of items.
> >
> > Ah, then using Ted’s metrics views _is_ probably your best bet. You can
> > probably still personalize view recommendations. Since you are already
> > using itemsimilarity it can be a second step that builds on the first.
> >
> >>
> >>>> Filter out any items not in the catalog from your recommendations.
> >> We have it on data preparation stage. We recalculate item similarity
> each
> >> day sliding back for 60 days excluding non-available items on
> preparation
> >> stage.
> >>
> >> Thank you! We did reach good results, business guys got satisfaction :)
> >>
> >>
> >> 2014-08-20 20:28 GMT+04:00 Pat Ferrel <[email protected]>:
> >>
> >>>>
> >>>> On Aug 19, 2014, at 11:26 PM, Serega Sheypak <
> [email protected]
> >>
> >>> wrote:
> >>>>
> >>>> Hi!
> >>>> 1. There was a bug in UI, I've checked raw recommendations. "water
> >>> heating
> >>>> device" has low score. So first 30 recommended items really fits
> > iPhone,
> >>>> next are not so good. Anyway result is good, thank you very much.
> >>>> 2. I've inspected "sessions" of users, really there are people who
> > viewed
> >>>> iphone and heating device. 10 people for last month.
> >>>> 3. I will calculate relative measurment, I didn't calc what is % of
> > these
> >>>> people comparing to others and how they fluence on score result.
> >>>>
> >>>
> >>> That’s great. The Spark version sorts the result by weights, but I
> think
> >>> the mapreduce version doesn't
> >>>
> >>>> *You wrote:*
> >>>> Then once you have that working you can add more actions but only with
> >>>> cross-cooccurrence, adding by weighting* will not work with this type
> > of
> >>>> recommender*, which recommender can work with weights for actions?
> >>>
> >>> What you are doing is best practice for showing similar “views”. The
> >>> technique for using multiple actions will be covered in a series of
> > blogs
> >>> posts and may be put on the Mahout site eventually. It requires
> >>> spark-itemsimilarity. For now I’d strongly suggest you look at training
> > on
> >>> purchase data alone - see the comments below.
> >>>
> >>>>
> >>>> *About building recommendations using sales.*
> >>>> Sales are less than 1% from item views. You will recommend only stuff
> >>>> people buy.
> >>>
> >>> The point is not volume of data but quality of data. I once measured
> how
> >>> predictive of purchases the views were and found them a rather poor
> >>> predictor. People look at 100 things and buy 1, as you say. The
> question
> >>> is: Do you want people to buy something or just browse your site?
> >>>
> >>> On the other hand you would need to see how good your coverage is of
> >>> purchases. Do you have enough items purchased by several people (Ted’s
> >>> questions below will guide you)? If there is good coverage then you _do
> >>> not_ restrict the range by using only purchase data. You increase the
> >>> quality.
> >>>
> >>>> If you recommend what people see you significantly widen range
> >>>> of possible buy actions. People always buy case "XXX" with iphone. You
> >>>> would never recommened them to buy case "YYY". If people watch "XXX"
> > and
> >>>> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more
> >>> expensive
> >>>> that is why people prefer cheaper "XXX". What's wrong with this
> >>> assumption?
> >>>
> >>> Nothing at all. Remember that your goal is to cause a purchase but
> using
> >>> views requires some “scrubbing” of views. You want, in effect,
> >>> views-that-lead-to-purchases. In a cooccurrence recommender this can be
> >>> done with cross-cooccurrence and I’ll describe that elsewhere, it’s too
> >>> long for an email to describe but pretty easy to use.
> >>>
> >>> I’d wager that if you restrict to purchases your sales will go up over
> >>> recommending views. But that is without looking at your data. If you
> > need
> >>> more data try increase the sliding time window to add more purchases.
> > This
> >>> will eventually start including things that are no longer in your
> > catalog
> >>> so will have diminishing returns but 60 days seem like a short time
> > period.
> >>> Filter out any items not in the catalog from your recommendations.
> >>>
> >>> You want recency to matter, this is good intuition. The in-catalog
> > filter
> >>> is one simple way, and there are others when you get to
> personalization.
> >>>
> >>>>
> >>>> *About our obsessive desire to add weights for actions.*
> >>>> We would like to self-tune our recommendations. If user clicks our
> >>>> recommendation it's a signal for us that items are related. So next
> > time
> >>>> this link should have higher score. What are the approaches to do it?
> >>>>
> >>>
> >>> Yes, you do want the things that lead to purchases to go into the
> > training
> >>> data. This is good intuition. But you don’t do it with weights you
> > train on
> >>> new purchases, regardless of whether they came from random views,
> >>> rec-views, or … You don’t care whether a rec was clicked on; you care
> > if a
> >>> purchase was made and you don’t care what part of the UI caused it. UI
> >>> analysis is very very important but doesn’t help the recommender, it
> > guides
> >>> UI decisions. So measuring clicks is good but shouldn’t be used to
> > change
> >>> recs.
> >>>
> >>> One way to increase the value of your recs is to add a little
> randomness
> >>> to their ordering. If you have 10 things to recommend get 20 from
> >>> itemsimilarity and apply a normally distributed random weighting, then
> >>> re-sort and show the top 10. This will move some things up in order and
> >>> show them where without the re-ordering they would never be shown. The
> >>> technique allows you to expose more items to possible purchase and
> >>> therefore affect the ordering the next time you train. The actual
> > algorithm
> >>> takes more space to describe but the idea is a lot like a multi-armed
> >>> bandit where the best bandit eventually gets all trials. In this case
> > the
> >>> best rec leads to a purchase and gets into the new training data and so
> >>> will be shown more often the next time.
> >>>
> >>> Another thing you can do is create a “shopping cart” recommender. This
> >>> looks at items purchased together—an item-set. It is a strong indicator
> > of
> >>> relatedness.
> >>>
> >>> Suggestions:
> >>> 1) personalize: this is likely to make the most difference since you
> > will
> >>> be showing different things to different people. The “Practical Machine
> >>> Learning” is short and easy to read—it describes this.
> >>> 2) move to purchase data training, wait for cross-cooccurrence to add
> in
> >>> view data. Do this if you have good coverage (Ted’s questions below
> > relate
> >>> to this).
> >>> 3) increase the training period if needed to get good catalog coverage
> >>> 4) consider dithering your recs to expose more items to purchase and
> >>> therefore self-tune by increasing the quality of your training data.
> >>>
> >>>>
> >>>>
> >>>>
> >>>> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>:
> >>>>
> >>>>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <
> >>> [email protected]
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> What could be a reason for recommending "Water heat device " to
> > iPhone?
> >>>>>> iPhone is one of the most popular item. There should be a lot of
> > people
> >>>>>> viewing iPhone with "Water heat device "?
> >>>>>>
> >>>>>
> >>>>> What are the numbers?
> >>>>>
> >>>>> How many people got each item?  How many people total?  How many
> > people
> >>> got
> >>>>> both?
> >>>>>
> >>>>> What about the same for the iPhone related items?
> >>>>>
> >>>>
> >>>
> >>
> >
>
>

Re: mapreduce ItemSimilarity input optimization

Reply via email to