Re: mapreduce ItemSimilarity input optimization

Serega Sheypak Thu, 21 Aug 2014 08:10:57 -0700

Excuse me looks like I've missed important point
"Ah, then using Ted’s metrics views _is_ probably your best bet."
You are talking about "personal recommendations" serving from search
engine? The idea was to get active vitior view history and give him
"similar" view histories from search engine in runtime?



2014-08-21 18:50 GMT+04:00 Pat Ferrel <[email protected]>:

> >
> > On Aug 21, 2014, at 1:22 AM, Serega Sheypak <[email protected]>
> wrote:
> >
> >>> What you are doing is best practice for showing similar “views”. The
> > technique for using multiple actions will be covered in a series of blogs
> > posts and may be put on the Mahout site eventually
> > Great thanks!
> >
> >>> People look at 100 things and buy 1, as you say. The question is: Do
> you
> > want people to buy something or just browse your site?
> > No objections for your point. I understand it. It should work for pretty
> > big ecom, right? Small ecom sell 100-200 items per day and have wide
> range
> > of items.
>
> Ah, then using Ted’s metrics views _is_ probably your best bet. You can
> probably still personalize view recommendations. Since you are already
> using itemsimilarity it can be a second step that builds on the first.
>
> >
> >>> Filter out any items not in the catalog from your recommendations.
> > We have it on data preparation stage. We recalculate item similarity each
> > day sliding back for 60 days excluding non-available items on preparation
> > stage.
> >
> > Thank you! We did reach good results, business guys got satisfaction :)
> >
> >
> > 2014-08-20 20:28 GMT+04:00 Pat Ferrel <[email protected]>:
> >
> >>>
> >>> On Aug 19, 2014, at 11:26 PM, Serega Sheypak <[email protected]
> >
> >> wrote:
> >>>
> >>> Hi!
> >>> 1. There was a bug in UI, I've checked raw recommendations. "water
> >> heating
> >>> device" has low score. So first 30 recommended items really fits
> iPhone,
> >>> next are not so good. Anyway result is good, thank you very much.
> >>> 2. I've inspected "sessions" of users, really there are people who
> viewed
> >>> iphone and heating device. 10 people for last month.
> >>> 3. I will calculate relative measurment, I didn't calc what is % of
> these
> >>> people comparing to others and how they fluence on score result.
> >>>
> >>
> >> That’s great. The Spark version sorts the result by weights, but I think
> >> the mapreduce version doesn't
> >>
> >>> *You wrote:*
> >>> Then once you have that working you can add more actions but only with
> >>> cross-cooccurrence, adding by weighting* will not work with this type
> of
> >>> recommender*, which recommender can work with weights for actions?
> >>
> >> What you are doing is best practice for showing similar “views”. The
> >> technique for using multiple actions will be covered in a series of
> blogs
> >> posts and may be put on the Mahout site eventually. It requires
> >> spark-itemsimilarity. For now I’d strongly suggest you look at training
> on
> >> purchase data alone - see the comments below.
> >>
> >>>
> >>> *About building recommendations using sales.*
> >>> Sales are less than 1% from item views. You will recommend only stuff
> >>> people buy.
> >>
> >> The point is not volume of data but quality of data. I once measured how
> >> predictive of purchases the views were and found them a rather poor
> >> predictor. People look at 100 things and buy 1, as you say. The question
> >> is: Do you want people to buy something or just browse your site?
> >>
> >> On the other hand you would need to see how good your coverage is of
> >> purchases. Do you have enough items purchased by several people (Ted’s
> >> questions below will guide you)? If there is good coverage then you _do
> >> not_ restrict the range by using only purchase data. You increase the
> >> quality.
> >>
> >>> If you recommend what people see you significantly widen range
> >>> of possible buy actions. People always buy case "XXX" with iphone. You
> >>> would never recommened them to buy case "YYY". If people watch "XXX"
> and
> >>> "YYY" it's reasonable to recommened "YYY". Maybe "YYY" it's more
> >> expensive
> >>> that is why people prefer cheaper "XXX". What's wrong with this
> >> assumption?
> >>
> >> Nothing at all. Remember that your goal is to cause a purchase but using
> >> views requires some “scrubbing” of views. You want, in effect,
> >> views-that-lead-to-purchases. In a cooccurrence recommender this can be
> >> done with cross-cooccurrence and I’ll describe that elsewhere, it’s too
> >> long for an email to describe but pretty easy to use.
> >>
> >> I’d wager that if you restrict to purchases your sales will go up over
> >> recommending views. But that is without looking at your data. If you
> need
> >> more data try increase the sliding time window to add more purchases.
> This
> >> will eventually start including things that are no longer in your
> catalog
> >> so will have diminishing returns but 60 days seem like a short time
> period.
> >> Filter out any items not in the catalog from your recommendations.
> >>
> >> You want recency to matter, this is good intuition. The in-catalog
> filter
> >> is one simple way, and there are others when you get to personalization.
> >>
> >>>
> >>> *About our obsessive desire to add weights for actions.*
> >>> We would like to self-tune our recommendations. If user clicks our
> >>> recommendation it's a signal for us that items are related. So next
> time
> >>> this link should have higher score. What are the approaches to do it?
> >>>
> >>
> >> Yes, you do want the things that lead to purchases to go into the
> training
> >> data. This is good intuition. But you don’t do it with weights you
> train on
> >> new purchases, regardless of whether they came from random views,
> >> rec-views, or … You don’t care whether a rec was clicked on; you care
> if a
> >> purchase was made and you don’t care what part of the UI caused it. UI
> >> analysis is very very important but doesn’t help the recommender, it
> guides
> >> UI decisions. So measuring clicks is good but shouldn’t be used to
> change
> >> recs.
> >>
> >> One way to increase the value of your recs is to add a little randomness
> >> to their ordering. If you have 10 things to recommend get 20 from
> >> itemsimilarity and apply a normally distributed random weighting, then
> >> re-sort and show the top 10. This will move some things up in order and
> >> show them where without the re-ordering they would never be shown. The
> >> technique allows you to expose more items to possible purchase and
> >> therefore affect the ordering the next time you train. The actual
> algorithm
> >> takes more space to describe but the idea is a lot like a multi-armed
> >> bandit where the best bandit eventually gets all trials. In this case
> the
> >> best rec leads to a purchase and gets into the new training data and so
> >> will be shown more often the next time.
> >>
> >> Another thing you can do is create a “shopping cart” recommender. This
> >> looks at items purchased together—an item-set. It is a strong indicator
> of
> >> relatedness.
> >>
> >> Suggestions:
> >> 1) personalize: this is likely to make the most difference since you
> will
> >> be showing different things to different people. The “Practical Machine
> >> Learning” is short and easy to read—it describes this.
> >> 2) move to purchase data training, wait for cross-cooccurrence to add in
> >> view data. Do this if you have good coverage (Ted’s questions below
> relate
> >> to this).
> >> 3) increase the training period if needed to get good catalog coverage
> >> 4) consider dithering your recs to expose more items to purchase and
> >> therefore self-tune by increasing the quality of your training data.
> >>
> >>>
> >>>
> >>>
> >>> 2014-08-20 7:18 GMT+04:00 Ted Dunning <[email protected]>:
> >>>
> >>>> On Tue, Aug 19, 2014 at 12:53 AM, Serega Sheypak <
> >> [email protected]
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> What could be a reason for recommending "Water heat device " to
> iPhone?
> >>>>> iPhone is one of the most popular item. There should be a lot of
> people
> >>>>> viewing iPhone with "Water heat device "?
> >>>>>
> >>>>
> >>>> What are the numbers?
> >>>>
> >>>> How many people got each item?  How many people total?  How many
> people
> >> got
> >>>> both?
> >>>>
> >>>> What about the same for the iPhone related items?
> >>>>
> >>>
> >>
> >
>

Re: mapreduce ItemSimilarity input optimization

Reply via email to