Re: mapreduce ItemSimilarity input optimization

Ted Dunning Fri, 22 Aug 2014 12:00:26 -0700

No.

Go for this more recent (and much shorter) one:


http://www.mapr.com/practical-machine-learning

And if you like it, leave a review on Amazon:

http://www.amazon.com/Practical-Machine-Learning-Innovations-Recommendation-ebook/dp/B00JRHVNT4






On Thu, Aug 21, 2014 at 11:31 PM, Serega Sheypak <[email protected]>
wrote:

> Ok, I got it. Is it Ted's book?
>
> http://www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684/ref=la_B00EHXC1NK_1_1?s=books&ie=UTF8&qid=1408689021&sr=1-1
>
> I've read this one:
>
> http://www.amazon.com/Apache-Mahout-Cookbook-Piero-Giacomelli-ebook/dp/B00HJR6R86/ref=sr_1_2?s=books&ie=UTF8&qid=1408689063&sr=1-2&keywords=mahout
>
> No satisfaction at all
>
>
>
>
> 2014-08-21 20:32 GMT+04:00 Pat Ferrel <[email protected]>:
>
> > Sorry that wasn’t clear.
> >
> > Given your purchase volume, you may not have very good coverage training
> > on purchases only. So using views may be your best bet. Ted’s metrics
> were:
> > "How many people got each item?  How many people total?  How many people
> > got both?” This is how you tell what action has enough data to be useful.
> > In your case that my be views.
> >
> > The other point was about doing personalization. Since you have
> > itemsimilarity working well you can add personalization with a search
> > engine using methods described in Ted’s book. This requires that you
> > capture user history (views in this case) and use that as a query on the
> > itemsimilarity data. If you know enough of the current user’s recent
> > history this will allow you to show “people with the same taste as you
> also
> > looked at these items”.
> >
> > Currently you are not personalizing, you are showing the same “similar
> > items” to every user. That is fine but personalization may improve things
> > further.
> >
> >
> > On Aug 21, 2014, at 8:08 AM, Serega Sheypak <[email protected]>
> > wrote:
> >
> > Excuse me looks like I've missed important point
> > "Ah, then using Ted’s metrics views _is_ probably your best bet."
> > You are talking about "personal recommendations" serving from search
> > engine? The idea was to get active vitior view history and give him
> > "similar" view histories from search engine in runtime?
> >
> >
> > 2014-08-21 18:50 GMT+04:00 Pat Ferrel <[email protected]>:
> >
> > >>
> > >> On Aug 21, 2014, at 1:22 AM, Serega Sheypak <[email protected]
> >
> > > wrote:
> > >>
> > >>>> What you are doing is best practice for showing similar “views”. The
> > >> technique for using multiple actions will be covered in a series of
> > blogs
> > >> posts and may be put on the Mahout site eventually
> > >> Great thanks!
> > >>
> > >>>> People look at 100 things and buy 1, as you say. The question is: Do
> > > you
> > >> want people to buy something or just browse your site?
> > >> No objections for your point. I understand it. It should work for
> pretty
> > >> big ecom, right? Small ecom sell 100-200 items per day and have wide
> > > range
> > >> of items.
> > >
> > > Ah, then using Ted’s metrics views _is_ probably your best bet. You can
> > > probably still personalize view recommendations. Since you are already
> > > using itemsimilarity it can be a second step that builds on the first.
> > >
> > >>
> > >>>> Filter out any items not in the catalog from your recommendations.
> > >> We have it on data preparation stage. We recalculate item similarity
> > each
> > >> day sliding back for 60 days excluding non-available items on
> > preparation
> > >> stage.
> > >>
> > >> Thank you! We did reach good results, business guys got satisfaction
> :)
> > >>
> > >>
> > >> 2014-08-20 20:28 GMT+04:00 Pat Ferrel <[email protected]>:
> > >>
> > >>>>
> > >>>> On Aug 19, 2014, at 11:26 PM, Serega Sheypak <
> > [email protected]
> > >>
> > >>> wrote:
> > >>>>
> > >>>> Hi!
> > >>>> 1. There was a bug in UI, I've checked raw recommendations. "water
> > >>> heating
> > >>>> device" has low score. So first 30 recommended items really fits
> > > iPhone,
> > >>>> next are not so good. Anyway result is good, thank you very much.
> > >>>> 2. I've inspected "sessions" of users, really there are people who
> > > viewed
> > >>>> iphone and heating device. 10 people for last month.
> > >>>> 3. I will calculate relative measurment, I didn't calc what is % of
> > > these
> > >>>> people comparing to others and how they fluence on score result.
> > >>>>
> > >>>
> > >>> That’s great. The Spark version sorts the result by weights, but I
> > think
> > >>> the mapreduce version doesn't
> > >>>
> > >>>> *You wrote:*
> > >>>> Then once you have that working you can add more actions but only
> with
> > >>>> cross-cooccurrence, adding by weighting* will not work with this
> type
> > > of
> > >>>> recommender*, which recommender can work with weights for actions?
> > >>>
> > >>> What you are doing is best practice for showing similar “views”. The
> > >>> technique for using multiple actions will be covered in a series of
> > > blogs
> > >>> posts and may be put on the Mahout site eventually. It requires
> > >>> spark-itemsimilarity. For now I’d strongly suggest you look at
> training
> > > on
> > >>> purchase data alone - see the comments below.
> > >>>
> > >>>>
> > >>>> *About building recommendations using sales.*
> > >>>> Sales are less than 1% from item views. You will recommend only
> stuff
> > >>>> people buy.
> > >>>
> > >>> The point is not volume of data but quality of data. I once measured
> > how
> > >>> predictive of purchases the views were and found them a rather poor
> > >>> predictor. People look at 100 things and buy 1, as you say. The
> > question
> > >>> is: Do you want people to buy something or just browse your site?
> > >>>
> > >>> On the other hand you would need to see how good your coverage is of
> > >>> purchases. Do you have enough items purchased by several people
> (Ted’s
> > >>> questions below will guide you)? If there is good coverage then you
> _do
> > >>> not_ restrict the range by using only purchase data. You increase the
> > >>> quality.
> > >>>
> > >>>> If you recommend what people see you significantly widen range
> > >>>> of possible buy actions. People always buy case "XXX" with iphone.
> You
> > >>>> would never recommened them to buy case "YYY". If people watch "XXX"
> > > and
> > >>>> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more
> > >>> expensive
> > >>>> that is why people prefer cheaper "XXX". What's wrong with this
> > >>> assumption?
> > >>>
> > >>> Nothing at all. Remember that your goal is to cause a purchase but
> > using
> > >>> views requires some “scrubbing” of views. You want, in effect,
> > >>> views-that-lead-to-purchases. In a cooccurrence recommender this can
> be
> > >>> done with cross-cooccurrence and I’ll describe that elsewhere, it’s
> too
> > >>> long for an email to describe but pretty easy to use.
> > >>>
> > >>> I’d wager that if you restrict to purchases your sales will go up
> over
> > >>> recommending views. But that is without looking at your data. If you
> > > need
> > >>> more data try increase the sliding time window to add more purchases.
> > > This
> > >>> will eventually start including things that are no longer in your
> > > catalog
> > >>> so will have diminishing returns but 60 days seem like a short time
> > > period.
> > >>> Filter out any items not in the catalog from your recommendations.
> > >>>
> > >>> You want recency to matter, this is good intuition. The in-catalog
> > > filter
> > >>> is one simple way, and there are others when you get to
> > personalization.
> > >>>
> > >>>>
> > >>>> *About our obsessive desire to add weights for actions.*
> > >>>> We would like to self-tune our recommendations. If user clicks our
> > >>>> recommendation it's a signal for us that items are related. So next
> > > time
> > >>>> this link should have higher score. What are the approaches to do
> it?
> > >>>>
> > >>>
> > >>> Yes, you do want the things that lead to purchases to go into the
> > > training
> > >>> data. This is good intuition. But you don’t do it with weights you
> > > train on
> > >>> new purchases, regardless of whether they came from random views,
> > >>> rec-views, or … You don’t care whether a rec was clicked on; you care
> > > if a
> > >>> purchase was made and you don’t care what part of the UI caused it.
> UI
> > >>> analysis is very very important but doesn’t help the recommender, it
> > > guides
> > >>> UI decisions. So measuring clicks is good but shouldn’t be used to
> > > change
> > >>> recs.
> > >>>
> > >>> One way to increase the value of your recs is to add a little
> > randomness
> > >>> to their ordering. If you have 10 things to recommend get 20 from
> > >>> itemsimilarity and apply a normally distributed random weighting,
> then
> > >>> re-sort and show the top 10. This will move some things up in order
> and
> > >>> show them where without the re-ordering they would never be shown.
> The
> > >>> technique allows you to expose more items to possible purchase and
> > >>> therefore affect the ordering the next time you train. The actual
> > > algorithm
> > >>> takes more space to describe but the idea is a lot like a multi-armed
> > >>> bandit where the best bandit eventually gets all trials. In this case
> > > the
> > >>> best rec leads to a purchase and gets into the new training data and
> so
> > >>> will be shown more often the next time.
> > >>>
> > >>> Another thing you can do is create a “shopping cart” recommender.
> This
> > >>> looks at items purchased together—an item-set. It is a strong
> indicator
> > > of
> > >>> relatedness.
> > >>>
> > >>> Suggestions:
> > >>> 1) personalize: this is likely to make the most difference since you
> > > will
> > >>> be showing different things to different people. The “Practical
> Machine
> > >>> Learning” is short and easy to read—it describes this.
> > >>> 2) move to purchase data training, wait for cross-cooccurrence to add
> > in
> > >>> view data. Do this if you have good coverage (Ted’s questions below
> > > relate
> > >>> to this).
> > >>> 3) increase the training period if needed to get good catalog
> coverage
> > >>> 4) consider dithering your recs to expose more items to purchase and
> > >>> therefore self-tune by increasing the quality of your training data.
> > >>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>:
> > >>>>
> > >>>>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <
> > >>> [email protected]
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> What could be a reason for recommending "Water heat device " to
> > > iPhone?
> > >>>>>> iPhone is one of the most popular item. There should be a lot of
> > > people
> > >>>>>> viewing iPhone with "Water heat device "?
> > >>>>>>
> > >>>>>
> > >>>>> What are the numbers?
> > >>>>>
> > >>>>> How many people got each item?  How many people total?  How many
> > > people
> > >>> got
> > >>>>> both?
> > >>>>>
> > >>>>> What about the same for the iPhone related items?
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
>

Re: mapreduce ItemSimilarity input optimization

Reply via email to