On 05/12/12 00:53, Ted Dunning wrote:
Also, you have to separate UI considerations from algorithm considerations.
What algorithm populates the recommendations is the recommender algorithm.
It has two responsibilities... first, find items that the users will like
and second, pick out a variety of less certain items to learn about. It is
not responsible for justifying choices to the user. The UI does that and
it may use analytics of some kind to make claims about choices made, but
that won't change the choices.
Here I disagree: explaining recommendations to the user is an important
factor in user acceptance (and therefore uptake) of the results, since
if she can understand why some completely unknown item was recommended
it'll make her more confident that it's a good choice (this has also
been proven experimentally). And the best one to know why something was
recommended is the engine itself. That's one good additional reason why
item-based neighbourhood is more advantageous than user-based: you can
communicate item neighbours to the user, which then see items she knows
that are similar to the one being recommended (it's one of the things
Amazon does in its recommendation lists). You can achieve more or less
the same with matrix factorization approaches.
Speaking about Amazon, the "also bought" UI thing is still there in
their website, only *not* in their specific recommendation lists. It's
down in the page, in sections like "Continue Shopping: Customers Who
Bought Items in Your Recent History Also Bought". It does not give %
values now, but it's essentially the same (and it works also when you
are not logged in, since it is using your recent viewing history).
That's why I thought it's coming from Market Basket Analysis (i.e.
frequent itemsets).
Lift is indeed a good metric for the interestingness of a rule, but it
can also produce unreasonably big values for rare itemsets. On the other
hand, maybe this is good for uncovering long tail associations.
Paulo
On Wed, Dec 5, 2012 at 12:48 AM, Sean Owen <[email protected]> wrote:
Yes it's not a recommender problem, it's a most-similar-items
problems. Frequent itemset mining is really just a most-similar-items
algorithm, for one particular definition of similar (confidence). In
that sense they are nearly the same.
For frequent itemsets, you have to pick a minimum support -- that's an
extra parameter to figure out, but that is precisely what speeds up
frequent itemset mining. But, it will also mean you have no answer for
long-tail items since they are excluded by the min support.
If min support is 0, then you run into a different issue. For an item
that was bought by 1 person, anything else bought by those people has
confidence 1; it loses ability to discriminate. I just looked it up on
Wikipedia and found there's an idea of "lift" which would be better to
rank on. It's essentially a likelihood ratio. Which takes you right
back to Ted's advice to just find items with highest (log-)likelihood
similarity.
For these reasons I also suspect that this is not actually how Amazon
et al determine which items to show you. In fact, I don't see any such
"70% of users also bought..." figures on their site now?
On Tue, Dec 4, 2012 at 10:51 PM, Paulo Villegas <[email protected]> wrote:
While the "70% of users bought also ... " could be generated by a
suitable recommendation engine, I think it fits better with a frequent
pattern mining approach i.e. Association Rules. I don't know if Amazon
implements it that way, but it seems likely, since it's not really a
personalized recommendation (unless we interpret the personalization as
coming from the pages the user is visiting, i.e. real-time profile
building).
I believe Mahout has a frequent itemset mining algorithm (FPGrowth),
though I've never tried it myself. For your problem, you would select
the minimum support for your itemsets (this would eliminate spurious
associations), and the confidence obtained would be directly your 70%
value.
Although your formulation selects only the rules with 1 item in the
antecedent, i.e. item1 -> item2, you could use the items visited before
to build bigger antecedents.
Paulo
________________________________
Este mensaje se dirige exclusivamente a su destinatario. Puede consultar
nuestra política de envío y recepción de correo electrónico en el enlace
situado más abajo.
This message is intended exclusively for its addressee. We only send and
receive email on the basis of the terms set out at:
http://www.tid.es/ES/PAGINAS/disclaimer.aspx