Yes.

Correlation is a problem because tables like

1 0
0 10^6

and

10 0
0 10^6

produce the same correlation.  LLR correctly distinguishes these cases.



On Mon, Aug 19, 2013 at 7:16 AM, Pat Ferrel <[email protected]> wrote:

> Which is why LLR would be really nice in two action cross-similairty case.
> The cross-corelation sparsification via cooccurrence is probably pretty
> weak, no?
>
>
> On Aug 18, 2013, at 11:53 AM, Ted Dunning <[email protected]> wrote:
>
> Outside of the context of your demo, suppose that you have events a, b, c
> and d.  Event a is the one we are centered on and is relatively rare.
> Event b is not so rare, but has weak correlation with a.  Event c is as
> rare as a, but correlates strongly with it.  Even d is quite common, but
> has no correlation with a.
>
> The 2x2 matrices that you would get would look something like this.  In
> each of these, a and NOT a are in rows while other and NOT other are in
> columns.
>
> versus b, llrRoot = 8.03
>       b NOT b  a *10* *10*  NOT a *1000* *99000*
>
>
>
> versus c, llrRoot = 11.5
>      c NOT c  a *10* *10*  NOT a *30* *99970*
>
>
>
> versus d, llrRoot = 0
>  d NOT d  a *10* *10*  NOT a *50000* *50000*
>
> Note that what we are holding constant here is the prevalence of a (20
> times) and the distribution of a under the conditions of the other symbol.
> What is being varied is the distribution of the other symbol in the "NOT
> a" case.
>
>
>
>
> On Sun, Aug 18, 2013 at 10:50 AM, B Lyon <[email protected]> wrote:
>
> > Thanks folks for taking a look.
> >
> > I haven't sat down to try it yet, but wondering how hard it is to
> construct
> > (realizable and realistic) k11, k12, k21, k22 values for three binary
> > sequences X, Y, Z where (X,Y) and (Y,Z) have same co-occurrence, but you
> > can tweak k12 and k21 so that the LLR values are extremely different in
> > both directions.  I assume that k22 doesn't matter much in practice since
> > things are sparse and k22 is huge.  Well, obviously, I guess you could
> > simply switch the k12/k21 values between the two sequence pairs to flip
> the
> > order at will... which is information that co-occurrence of course does
> not
> > "know about".
> >
> >
> > On Sat, Aug 17, 2013 at 10:30 PM, Ted Dunning <[email protected]>
> > wrote:
> >
> >> This is nice.  As you say, k11 is the only part that is used in
> >> cooccurrence and it doesn't weight by prevalence, either.
> >>
> >> This size analysis is hard to demonstrate much difference because it is
> >> hard to show interesting values of LLR without absurdly string
> > coordination
> >> between items.
> >>
> >>
> >> On Fri, Aug 16, 2013 at 8:21 PM, B Lyon <[email protected]> wrote:
> >>
> >>> As part of trying to get a better grip on recommenders, I have started
> > a
> >>> simple interactive visualization that begins with the raw data of
> >> user-item
> >>> interactions and goes all the way to being able to twiddle the
> >> interactions
> >>> in a test user vector to see the impact on recommended items.  This is
> >> for
> >>> simple "user interacted with an item" case rather than numerical
> >>> preferences for items.  The goal is to show the intermediate pieces and
> >> how
> >>> they fit together via popup text on mouseovers and dynamic highlighting
> >> of
> >>> the related pieces.  I am of course interested in feedback as I keep
> >>> tweaking on it - not sure I got all the terminology quite right yet,
> > for
> >>> example, and might have missed some other things I need to know about.
> >>> Note that this material is covered in Chapter 6.2 in MIA in the
> >> discussion
> >>> on distributed recommenders.
> >>>
> >>> It's on googledrive here (very much a work-in-progress):
> >>>
> >>> https://googledrive.com/host/0B2GQktu-wcTiWHRwZFJacjlqODA/
> >>>
> >>> (apologies to small resolution screens)
> >>>
> >>> This is based only on the co-occurrence matrix, rather than including
> > the
> >>> other similarity measures, although in working through this, it seems
> >> that
> >>> the other ones can just be interpreted as having alternative
> > definitions
> >> of
> >>> what "*" means in matrix multiplication of A^T*A, where A is the
> >> user-item
> >>> matrix... and as an aside to me begs the interesting question of
> > [purely
> >>> hypotheticall?] situations where LLR and co-occurrence are at odds with
> >>> each other in making recommendations, as co-occurrence seems to be just
> >>> using the "k11" term that is part of the LLR calculation.
> >>>
> >>> My goal (at the moment at least) is to eventually continue this for the
> >>> solr-recommender project that started as few weeks ago, where we have
> > the
> >>> additional cross-matrix, as well as a kind of regrouping of pieces for
> >>> solr.
> >>>
> >>>
> >>> --
> >>> BF Lyon
> >>> http://www.nowherenearithaca.com
> >>>
> >>
> >
> >
> >
> > --
> > BF Lyon
> > http://www.nowherenearithaca.com
> >
>
>

Reply via email to