Re: Mahout Amazon EMR usage cost

Ted Dunning Tue, 04 Dec 2012 23:41:13 -0800

On Wed, Dec 5, 2012 at 6:57 AM, Paulo Villegas <[email protected]>wrote:

> On 05/12/12 00:53, Ted Dunning wrote:
>
>> Also, you have to separate UI considerations from algorithm
>> considerations.
>>   What algorithm populates the recommendations is the recommender
>> algorithm.
>>   It has two responsibilities... first, find items that the users will
>> like
>> and second, pick out a variety of less certain items to learn about.  It
>> is
>> not responsible for justifying choices to the user.  The UI does that and
>> it may use analytics of some kind to make claims about choices made, but
>> that won't change the choices.
>>
>
> Here I disagree: explaining recommendations to the user is an important
> factor in user acceptance (and therefore uptake) of the results, since if
> she can understand why some completely unknown item was recommended it'll
> make her more confident that it's a good choice (this has also been proven
> experimentally).

I have demonstrated that explanations help as well in some cases.  Not in
all.

> And the best one to know why something was recommended is the engine
> itself.

This is simply not true.  The engine may have very complex reasons for
recommendation.  This applies in classification as well.  It is completely
conventional, and often critical to performance to have one engine for
recommendation or classification and a completely independent one for
explanation.

> That's one good additional reason why item-based neighbourhood is more
> advantageous than user-based: you can communicate item neighbours to the
> user, which then see items she knows that are similar to the one being
> recommended (it's one of the things Amazon does in its recommendation
> lists).

Again.  This simply isn't that important.  The major goal of the
recommendation engine is to produce high quality recommendations and one of
the major problems in doing that is avoiding noise effects.  Ironically, it
is also important for the recommendation engine to inject metered amounts
of a different kind of noise as well.  Neither of those capabilities make
sense to explain to the user and these may actually dominate the decisions.

Once an explainer is given a clean set of recommendations, then the problem
of explaining is vastly different than the job of recommending.  For
instance Tanimoto or Jaccard are horrible for recommendation but great for
explaining.  The issue is that the explainer doesn't have to explain all of
the items that are *not* shown, only those which are shown.

Note that Amazon does not actually explain their market basket
recommendations.  And in their personal recommendations (which they have
partially hidden now), you have to ask for the explanation.  The
explanation that they give is typically one or two of your actions which is
patently not a complete explanation.  So they clearly are saying one thing
and doing another, just as I am recommending here.

> Speaking about Amazon, the "also bought" UI thing is still there in their
> website, only *not* in their specific recommendation lists.

But note that they don't give percentages any more.  Also note that they
don't explain all of the things that they *don't* show you.

> It's down in the page, in sections like "Continue Shopping: Customers Who
> Bought Items in Your Recent History Also Bought". It does not give % values
> now, but it's essentially the same (and it works also when you are not
> logged in, since it is using your recent viewing history). That's why I
> thought it's coming from Market Basket Analysis (i.e. frequent itemsets).
>

I doubt it seriously.  Frequent itemsets is typically much more expensive
than simple recommendations.

> Lift is indeed a good metric for the interestingness of a rule, but it can
> also produce unreasonably big values for rare itemsets. On the other hand,
> maybe this is good for uncovering long tail associations.
>

I have built a number of commercially successful recommendation engines and
simple overlap has always been a complete disaster.  I have also counseled
a number of companies along the lines given here and the resulting numbers
that they have achieved have been quite striking when they switched to
roughly what I am describing here.

The only time the overlap is likely to work is if you have absolutely
massive data and can afford very high thresholds.  That completely
obliterates the long tail.

You can claim to understand a system like Amazon's from the UI, but I would
seriously doubt that you are seeing 5% of what the recommendation engine is
really doing.

Re: Mahout Amazon EMR usage cost

Reply via email to