Re: Mahout Amazon EMR usage cost

Ted Dunning Tue, 04 Dec 2012 15:54:10 -0800

Also, you have to separate UI considerations from algorithm considerations.
 What algorithm populates the recommendations is the recommender algorithm.
 It has two responsibilities... first, find items that the users will like
and second, pick out a variety of less certain items to learn about.  It is
not responsible for justifying choices to the user.  The UI does that and
it may use analytics of some kind to make claims about choices made, but
that won't change the choices.


On Wed, Dec 5, 2012 at 12:48 AM, Sean Owen <[email protected]> wrote:

> Yes it's not a recommender problem, it's a most-similar-items
> problems. Frequent itemset mining is really just a most-similar-items
> algorithm, for one particular definition of similar (confidence). In
> that sense they are nearly the same.
>
> For frequent itemsets, you have to pick a minimum support -- that's an
> extra parameter to figure out, but that is precisely what speeds up
> frequent itemset mining. But, it will also mean you have no answer for
> long-tail items since they are excluded by the min support.
>
> If min support is 0, then you run into a different issue. For an item
> that was bought by 1 person, anything else bought by those people has
> confidence 1; it loses ability to discriminate. I just looked it up on
> Wikipedia and found there's an idea of "lift" which would be better to
> rank on. It's essentially a likelihood ratio. Which takes you right
> back to Ted's advice to just find items with highest (log-)likelihood
> similarity.
>
> For these reasons I also suspect that this is not actually how Amazon
> et al determine which items to show you. In fact, I don't see any such
> "70% of users also bought..." figures on their site now?
>
>
> On Tue, Dec 4, 2012 at 10:51 PM, Paulo Villegas <[email protected]> wrote:
> > While the "70% of users bought also ... " could be generated by a
> > suitable recommendation engine, I think it fits better with a frequent
> > pattern mining approach i.e. Association Rules. I don't know if Amazon
> > implements it that way, but it seems likely, since it's not really a
> > personalized recommendation (unless we interpret the personalization as
> > coming from the pages the user is visiting, i.e. real-time profile
> > building).
> >
> > I believe Mahout has a frequent itemset mining algorithm (FPGrowth),
> > though I've never tried it myself. For your problem, you would select
> > the minimum support for your itemsets (this would eliminate spurious
> > associations), and the confidence obtained would be directly your 70%
> value.
> >
> > Although your formulation selects only the rules with 1 item in the
> > antecedent, i.e. item1 -> item2, you could use the items visited before
> > to build bigger antecedents.
> >
> > Paulo
> >
> >
> > ________________________________
> >
> > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> > nuestra política de envío y recepción de correo electrónico en el enlace
> > situado más abajo.
> > This message is intended exclusively for its addressee. We only send and
> > receive email on the basis of the terms set out at:
> > http://www.tid.es/ES/PAGINAS/disclaimer.aspx
>

Re: Mahout Amazon EMR usage cost

Reply via email to