Yes it's not a recommender problem, it's a most-similar-items problems. Frequent itemset mining is really just a most-similar-items algorithm, for one particular definition of similar (confidence). In that sense they are nearly the same.
For frequent itemsets, you have to pick a minimum support -- that's an extra parameter to figure out, but that is precisely what speeds up frequent itemset mining. But, it will also mean you have no answer for long-tail items since they are excluded by the min support. If min support is 0, then you run into a different issue. For an item that was bought by 1 person, anything else bought by those people has confidence 1; it loses ability to discriminate. I just looked it up on Wikipedia and found there's an idea of "lift" which would be better to rank on. It's essentially a likelihood ratio. Which takes you right back to Ted's advice to just find items with highest (log-)likelihood similarity. For these reasons I also suspect that this is not actually how Amazon et al determine which items to show you. In fact, I don't see any such "70% of users also bought..." figures on their site now? On Tue, Dec 4, 2012 at 10:51 PM, Paulo Villegas <[email protected]> wrote: > While the "70% of users bought also ... " could be generated by a > suitable recommendation engine, I think it fits better with a frequent > pattern mining approach i.e. Association Rules. I don't know if Amazon > implements it that way, but it seems likely, since it's not really a > personalized recommendation (unless we interpret the personalization as > coming from the pages the user is visiting, i.e. real-time profile > building). > > I believe Mahout has a frequent itemset mining algorithm (FPGrowth), > though I've never tried it myself. For your problem, you would select > the minimum support for your itemsets (this would eliminate spurious > associations), and the confidence obtained would be directly your 70% value. > > Although your formulation selects only the rules with 1 item in the > antecedent, i.e. item1 -> item2, you could use the items visited before > to build bigger antecedents. > > Paulo > > > ________________________________ > > Este mensaje se dirige exclusivamente a su destinatario. Puede consultar > nuestra política de envío y recepción de correo electrónico en el enlace > situado más abajo. > This message is intended exclusively for its addressee. We only send and > receive email on the basis of the terms set out at: > http://www.tid.es/ES/PAGINAS/disclaimer.aspx
