Yes. Correlation is a problem because tables like
1 0 0 10^6 and 10 0 0 10^6 produce the same correlation. LLR correctly distinguishes these cases. On Mon, Aug 19, 2013 at 7:16 AM, Pat Ferrel <[email protected]> wrote: > Which is why LLR would be really nice in two action cross-similairty case. > The cross-corelation sparsification via cooccurrence is probably pretty > weak, no? > > > On Aug 18, 2013, at 11:53 AM, Ted Dunning <[email protected]> wrote: > > Outside of the context of your demo, suppose that you have events a, b, c > and d. Event a is the one we are centered on and is relatively rare. > Event b is not so rare, but has weak correlation with a. Event c is as > rare as a, but correlates strongly with it. Even d is quite common, but > has no correlation with a. > > The 2x2 matrices that you would get would look something like this. In > each of these, a and NOT a are in rows while other and NOT other are in > columns. > > versus b, llrRoot = 8.03 > b NOT b a *10* *10* NOT a *1000* *99000* > > > > versus c, llrRoot = 11.5 > c NOT c a *10* *10* NOT a *30* *99970* > > > > versus d, llrRoot = 0 > d NOT d a *10* *10* NOT a *50000* *50000* > > Note that what we are holding constant here is the prevalence of a (20 > times) and the distribution of a under the conditions of the other symbol. > What is being varied is the distribution of the other symbol in the "NOT > a" case. > > > > > On Sun, Aug 18, 2013 at 10:50 AM, B Lyon <[email protected]> wrote: > > > Thanks folks for taking a look. > > > > I haven't sat down to try it yet, but wondering how hard it is to > construct > > (realizable and realistic) k11, k12, k21, k22 values for three binary > > sequences X, Y, Z where (X,Y) and (Y,Z) have same co-occurrence, but you > > can tweak k12 and k21 so that the LLR values are extremely different in > > both directions. I assume that k22 doesn't matter much in practice since > > things are sparse and k22 is huge. Well, obviously, I guess you could > > simply switch the k12/k21 values between the two sequence pairs to flip > the > > order at will... which is information that co-occurrence of course does > not > > "know about". > > > > > > On Sat, Aug 17, 2013 at 10:30 PM, Ted Dunning <[email protected]> > > wrote: > > > >> This is nice. As you say, k11 is the only part that is used in > >> cooccurrence and it doesn't weight by prevalence, either. > >> > >> This size analysis is hard to demonstrate much difference because it is > >> hard to show interesting values of LLR without absurdly string > > coordination > >> between items. > >> > >> > >> On Fri, Aug 16, 2013 at 8:21 PM, B Lyon <[email protected]> wrote: > >> > >>> As part of trying to get a better grip on recommenders, I have started > > a > >>> simple interactive visualization that begins with the raw data of > >> user-item > >>> interactions and goes all the way to being able to twiddle the > >> interactions > >>> in a test user vector to see the impact on recommended items. This is > >> for > >>> simple "user interacted with an item" case rather than numerical > >>> preferences for items. The goal is to show the intermediate pieces and > >> how > >>> they fit together via popup text on mouseovers and dynamic highlighting > >> of > >>> the related pieces. I am of course interested in feedback as I keep > >>> tweaking on it - not sure I got all the terminology quite right yet, > > for > >>> example, and might have missed some other things I need to know about. > >>> Note that this material is covered in Chapter 6.2 in MIA in the > >> discussion > >>> on distributed recommenders. > >>> > >>> It's on googledrive here (very much a work-in-progress): > >>> > >>> https://googledrive.com/host/0B2GQktu-wcTiWHRwZFJacjlqODA/ > >>> > >>> (apologies to small resolution screens) > >>> > >>> This is based only on the co-occurrence matrix, rather than including > > the > >>> other similarity measures, although in working through this, it seems > >> that > >>> the other ones can just be interpreted as having alternative > > definitions > >> of > >>> what "*" means in matrix multiplication of A^T*A, where A is the > >> user-item > >>> matrix... and as an aside to me begs the interesting question of > > [purely > >>> hypotheticall?] situations where LLR and co-occurrence are at odds with > >>> each other in making recommendations, as co-occurrence seems to be just > >>> using the "k11" term that is part of the LLR calculation. > >>> > >>> My goal (at the moment at least) is to eventually continue this for the > >>> solr-recommender project that started as few weeks ago, where we have > > the > >>> additional cross-matrix, as well as a kind of regrouping of pieces for > >>> solr. > >>> > >>> > >>> -- > >>> BF Lyon > >>> http://www.nowherenearithaca.com > >>> > >> > > > > > > > > -- > > BF Lyon > > http://www.nowherenearithaca.com > > > >
