Yes it's not a recommender problem, it's a most-similar-items
problems. Frequent itemset mining is really just a most-similar-items
algorithm, for one particular definition of similar (confidence). In
that sense they are nearly the same.

For frequent itemsets, you have to pick a minimum support -- that's an
extra parameter to figure out, but that is precisely what speeds up
frequent itemset mining. But, it will also mean you have no answer for
long-tail items since they are excluded by the min support.

If min support is 0, then you run into a different issue. For an item
that was bought by 1 person, anything else bought by those people has
confidence 1; it loses ability to discriminate. I just looked it up on
Wikipedia and found there's an idea of "lift" which would be better to
rank on. It's essentially a likelihood ratio. Which takes you right
back to Ted's advice to just find items with highest (log-)likelihood
similarity.

For these reasons I also suspect that this is not actually how Amazon
et al determine which items to show you. In fact, I don't see any such
"70% of users also bought..." figures on their site now?


On Tue, Dec 4, 2012 at 10:51 PM, Paulo Villegas <[email protected]> wrote:
> While the "70% of users bought also ... " could be generated by a
> suitable recommendation engine, I think it fits better with a frequent
> pattern mining approach i.e. Association Rules. I don't know if Amazon
> implements it that way, but it seems likely, since it's not really a
> personalized recommendation (unless we interpret the personalization as
> coming from the pages the user is visiting, i.e. real-time profile
> building).
>
> I believe Mahout has a frequent itemset mining algorithm (FPGrowth),
> though I've never tried it myself. For your problem, you would select
> the minimum support for your itemsets (this would eliminate spurious
> associations), and the confidence obtained would be directly your 70% value.
>
> Although your formulation selects only the rules with 1 item in the
> antecedent, i.e. item1 -> item2, you could use the items visited before
> to build bigger antecedents.
>
> Paulo
>
>
> ________________________________
>
> Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
> nuestra política de envío y recepción de correo electrónico en el enlace
> situado más abajo.
> This message is intended exclusively for its addressee. We only send and
> receive email on the basis of the terms set out at:
> http://www.tid.es/ES/PAGINAS/disclaimer.aspx

Reply via email to