Re: Mahout Amazon EMR usage cost

Paulo Villegas Wed, 05 Dec 2012 00:24:49 -0800

I don't disagree at all with what you're saying. I never said (orintended to say) that explanations would have to be a thorough dump ofthe engine's internal computation; this would not make sense to the userand would just overwhelm him. Picking up a couple of representativeitems would be more than enough.

And if the original algorithm is too complicated yes, it may make senseto bring up an additional, simpler and more understandable engine justto pick up explanations. But then you need to ensure that theexplanations fit well with the results you're actually delivering. Andin any case if you've got that additional engine and it works sensibly,you could as well aggregate its results into the main system and buildup an ensemble. It may not work in all cases, but may do well in others.YMMV.

I'm also not saying I know exactly what Amazon is doing internally, youneed a lot more than a casual look at the UI to infer that. They couldbe doing frequent itemset mining, or they couldn't. But I sustain it canbe a valid approach. A recommendation coming from association rules willhave less coverage than a "standard" CF engine, and will probably miss abigger part of the long tail, but for the target of enlarging the basketof items the user is willing to buy in a single transaction is perfectlywell suited (i.e. don't find "the next best item", find "the item thatgoes along well with this one").

And if you model transactions adequately (like items watched in a singlebrowsing session, when you might state that the user has a single mainintent, as opposed to coming back next day with a different thing inmind) then it might help to neglect spurious associations (such as yousee sometimes in Amazon, anyway). Of course, a similar effect can beachieved with a "standard" recommender engine if you introduce time effects.

On Wed, Dec 5, 2012 at 6:57 AM, Paulo Villegas <[email protected]>wrote:

On 05/12/12 00:53, Ted Dunning wrote:

Also, you have to separate UI considerations from algorithm
considerations.
   What algorithm populates the recommendations is the recommender
algorithm.
   It has two responsibilities... first, find items that the users will
like
and second, pick out a variety of less certain items to learn about.  It
is
not responsible for justifying choices to the user.  The UI does that and
it may use analytics of some kind to make claims about choices made, but
that won't change the choices.


Here I disagree: explaining recommendations to the user is an important
factor in user acceptance (and therefore uptake) of the results, since if
she can understand why some completely unknown item was recommended it'll
make her more confident that it's a good choice (this has also been proven
experimentally).



I have demonstrated that explanations help as well in some cases.  Not in
all.

And the best one to know why something was recommended is the engine
itself.



This is simply not true.  The engine may have very complex reasons for
recommendation.  This applies in classification as well.  It is completely
conventional, and often critical to performance to have one engine for
recommendation or classification and a completely independent one for
explanation.

That's one good additional reason why item-based neighbourhood is more
advantageous than user-based: you can communicate item neighbours to the
user, which then see items she knows that are similar to the one being
recommended (it's one of the things Amazon does in its recommendation
lists).



Again.  This simply isn't that important.  The major goal of the
recommendation engine is to produce high quality recommendations and one of
the major problems in doing that is avoiding noise effects.  Ironically, it
is also important for the recommendation engine to inject metered amounts
of a different kind of noise as well.  Neither of those capabilities make
sense to explain to the user and these may actually dominate the decisions.

Once an explainer is given a clean set of recommendations, then the problem
of explaining is vastly different than the job of recommending.  For
instance Tanimoto or Jaccard are horrible for recommendation but great for
explaining.  The issue is that the explainer doesn't have to explain all of
the items that are *not* shown, only those which are shown.

Note that Amazon does not actually explain their market basket
recommendations.  And in their personal recommendations (which they have
partially hidden now), you have to ask for the explanation.  The
explanation that they give is typically one or two of your actions which is
patently not a complete explanation.  So they clearly are saying one thing
and doing another, just as I am recommending here.

Speaking about Amazon, the "also bought" UI thing is still there in their
website, only *not* in their specific recommendation lists.



But note that they don't give percentages any more.  Also note that they
don't explain all of the things that they *don't* show you.

It's down in the page, in sections like "Continue Shopping: Customers Who
Bought Items in Your Recent History Also Bought". It does not give % values
now, but it's essentially the same (and it works also when you are not
logged in, since it is using your recent viewing history). That's why I
thought it's coming from Market Basket Analysis (i.e. frequent itemsets).


I doubt it seriously.  Frequent itemsets is typically much more expensive
than simple recommendations.

Lift is indeed a good metric for the interestingness of a rule, but it can
also produce unreasonably big values for rare itemsets. On the other hand,
maybe this is good for uncovering long tail associations.


I have built a number of commercially successful recommendation engines and
simple overlap has always been a complete disaster.  I have also counseled
a number of companies along the lines given here and the resulting numbers
that they have achieved have been quite striking when they switched to
roughly what I am describing here.

The only time the overlap is likely to work is if you have absolutely
massive data and can afford very high thresholds.  That completely
obliterates the long tail.

You can claim to understand a system like Amazon's from the UI, but I would
seriously doubt that you are seeing 5% of what the recommendation engine is
really doing.

Re: Mahout Amazon EMR usage cost

Reply via email to